Top Banner
On Stochastic Dynamic Programming and its Application to Maintenance FRANC ¸OIS BESNARD Master’s Degree Project Stockholm, Sweden 2007
81
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Models

On Stochastic Dynamic Programming

and its Application to Maintenance

FRANCOIS BESNARD

Masterrsquos Degree Project

Stockholm Sweden 2007

On Stochastic Dynamic Programming and its

Application to Maintenance

MASTER THESIS BY FRANCcedilOIS BESNARD

Master Thesis written at the Royal Institute of TechnologyKTH School of Electrical Engineering June 2007

Supervisor Assistant Professor Lina Bertling (KTH) Professor Michael Patriksson(Chalmers Applied Mathematics) Dr Erik Dotzauer (Fortum)

Examiner Assistant Professor Lina Bertling

XR-EE-ETK 2007008

Abstract

The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of the power system The generat-ing companies as well as transmission and distribution system operators aim tominimize their costs Maintenance can be a significant part of the total costs Thepressure to reduce the maintenance budget leads to a need for efficient maintenance

This work focus on an optimization methodology that could be useful for optimizingmaintenance The method stochastic dynamic programming is interesting becauseit can integrate explicitely the stochastic behavior of functional failures

Different models based on stochastic dynamic programming are reviewed with thepossible optimization methods to solve them The interests of the models in the con-text of maintenance optimization are discussed An example on a multi-componentreplacement application is proposed to illustrate the theory

Keywords Maintenance Optimization Dynamic Programming Markov DecisionProcess Power Production

III

Acknowledgements

First of all I would like to thank my supervisors who each in their way supportedme in this work Ass Prof Lina Bertling for her encouragements constructiveremarks and for giving me the opportunity of working on this project Dr ErikDotzauer for many valuable inputs discussions and comments and Prof MichaelPatriksson for his help on mathematical writing

Special greetings to all my friends and companions of study all over the world

Finally my heart turns to my parents and my love for their endless encouragementsand support in my studies and life

Stockholm June 2007

V

Abreviations

ADP Approximate Dynamic ProgrammingCBM Condition Based MaintenanceCM Corrective MaintenanceDP Dynamic ProgrammingIHSDP Infinite Horizon Stochastic Dynamic ProgrammingLP Linear ProgrammingMDP Markov Decision ProcessPI Policy IterationPM Preventive MaintenanceRCAM Reliability Centered Asset MaintenanceRCM Reliability Centered MaintenanceSDP Stochastic Dynamic ProgrammingSMDP Semi-Markov Decision ProcessTBM Time Based MaintenanceVI Value Iteration

VII

Notations

NumbersM Number of iteration for the evaluation step of modified policy iterationN Number of stages

Constantα Discount factor ll

Variablesi State at the current stagej State at the next stagek Stagem Number of iteration left for the evaluation step of modified policy iterationq Iteration number for the policy iteration algorithmu Decision variable

State and Control Spacemicrok Function mapping the states with a decisionmicrolowastk(i) Optimal decision at state k for state imicro Decision policy for stationary systemsmicrolowast Optimal decision policy for stationary systemsπ Policyπlowast Optimal policyUk Decision action at stage kUlowastk (i) Optimal decision action at stage k for state iXk State at stage k

Dynamic and Cost functionsCk(i u) Cost functionCk(i u j) Cost functionCij(u) = C(i u j) Cost function if the system is stationaryCN (i) Terminal cost for state ifk(i u) Dynamic functionfk(i u ω) Stochastic dynamic functionJlowastk (i) Optimal cost-to-go from stage k to N starting from state iωk(i u) Probabilistic function of a disturbances Pk(j u i) Transition probability functionP (j u i) Transition probability function for stationary systemsV (Xk) Cost-to-go resulting of a trajectory starting from state Xk

Sets

IX

ΩUk (i) Decision Space at stage k for state iΩXk State space at stage k

Contents

Contents XI

1 Introduction 1

11 Background 1

12 Objective 2

13 Approach 2

14 Outline 2

2 Maintenance 5

21 Types of Maintenance 5

22 Maintenance Optimization Models 6

3 Introduction to the Power System 11

31 Power System Presentation 11

32 Costs 13

33 Main Constraints 13

4 Introduction to Dynamic Programming 15

41 Introduction 15

42 Deterministic Dynamic Programming 18

5 Finite Horizon Models 23

51 Problem Formulation 23

52 Optimality Equation 25

53 Value Iteration Method 25

54 The Curse of Dimensionality 26

55 Ideas for a Maintenance Optimization Model 26

6 Infinite Horizon Models - Markov Decision Processes 29

61 Problem Formulation 29

62 Optimality Equations 31

63 Value Iteration 31

64 The Policy Iteration Algorithm 31

65 Modified Policy Iteration 32

66 Average Cost-to-go Problems 33

XI

67 Linear Programming 3468 Efficiency of the Algorithms 3569 Semi-Markov Decision Process 35

7 Approximate Methods for Markov Decision Process - Reinforcement Learning 3771 Introduction 3772 Direct Learning 3873 Indirect Learning 4174 Supervised Learning 42

8 Review of Models for Maintenance Optimization 4381 Finite Horizon Dynamic Programming 4382 Infinite Horizon Stochastic Models 4483 Reinforcement Learning 4584 Conclusions 45

9 A Proposed Finite Horizon Replacement Model 4791 One-Component Model 4792 Multi-Component model 5593 Possible Extensions 59

10 Conclusions and Future Work 61

A Solution of the Shortest Path Example 63

Reference List 65

Chapter 1

Introduction

11 Background

The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of modern power system The gen-erating companies as well as transmission and distribution system operators aimto minimize their costs Maintenance costs can be a significant part of the totalcosts The pressure to reduce the maintenance budget leads to a need for efficientmaintenance

Maintenance cost be divided into Corrective Maintenance (CM) and PreventiveMaintenance (PM) (see Chapter 21)

CM means that an asset is maintained once an unscheduled functionnal failureoccurs CM can imply high costs for unsupplied energy interruption possible de-terioration of the system human risks or environment consequences etc

PM is employed to reduce the risk of unexpected failure Time Based Maintenance(TBM) is used for the most critical components and Condition Based Maintenance(CBM) for the components that are worth and not too expensive to monitoreThese maintenance actions have a cost for unsupplied energy inspection repairreplacement etc

An efficient maintenance should balance the corrective and preventive maintenanceto minimize the total costs of maintenance

The probability of a functionnal failure for a component is stochastic The probabil-ity depends on the state of component resulting from the history of the component(age intensity of use external stress (such as weather) maintenance actions human

1

errors and construction errors) Stochastic Dynamic Programming (SDP) modelsare optimization models that integrate explicitely stochastic behaviors This featuremakes the models interesting and was the starting idea of this work

12 Objective

The main objective of this work is to investigate the use of stochastic dynamicprogramming models for maintenance optimization and identify possible future ap-plications in power systems

13 Approach

The first task was to understand the different dynamic programming approachesA first distinction was made between finite horizon and infinite horizon approaches

The different techniques that can be used for solving a model based on dynamicprogramming was investigated For infinite horizon models approximate dynamicprogramming was studied These types of methods are related to the field of rein-forcement learning

Some SDP models found in the literature was reviewed Conclusions was madeabout the applicability of each approach for maintenance optimization problemsMoreover future avenue for research was identified

A finite horizon replacement model was developed to illustrate the possible use ofSDP for power system maintenance

14 Outline

Chapter 2 solves an overview of the maintenance field The most important methodsand some optimization models are reviewed

Chapter 3 discusses shortly power systems Some costs and constraints for opti-mization models are proposed

Chapter 4-7 focus on different Dynamic Programming (DP) approaches and al-gorithms to solve them The assumption of the models and practical limitationsare discussed The basic of DP models is investigated in deterministic models inChapter 4 Chapter 5 and 6 focus on Stochastic Dynamic Programming methods

2

respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods

Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization

Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization

Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch

3

Chapter 2

Maintenance

The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22

21 Types of Maintenance

Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance

Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure

Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance

1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established

5

Maintenance

Preventive Maintenance

Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)

Continuous Schedulled Inspection Based

Corrective Maintenance

Figure 21 Maintenance Tree based on [1]

2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures

22 Maintenance Optimization Models

Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost

The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance

Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems

In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]

6

221 Age Replacement Policies

Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model

A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age

A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured

An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement

222 Block Replacement Policies

In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT

This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution

223 Condition Based Maintenance

CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components

One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement

7

of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables

For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection

An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)

224 Opportunistic Maintenance Models

Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved

Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components

A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models

225 Other Types of Models and Criteria of Classifications

Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system

Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model

8

is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used

The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions

9

Chapter 3

Introduction to the Power

System

This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed

31 Power System Presentation

Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents

311 Power System Description

A simple description of the power system include the following main parts

1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated

2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units

11

3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)

4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage

The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition

The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs

312 Maintenance in Power System

The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined

Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])

Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more

12

attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)

The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems

32 Costs

Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows

bull Manpower cost Cost for the maintenance team that performs maintenanceactions

bull Spare part cost The cost of a new component is an important part of themaintenance cost

bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine

bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency

bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure

bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)

33 Main Constraints

Possibles constraints for the maintenance of power system have been identified asfollows

13

bull Manpower The size and availability of the maintenance staff is limited

bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available

bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms

bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time

bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model

bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model

bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model

14

Chapter 4

Introduction to Dynamic

Programming

This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels

41 Introduction

Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system

The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions

In the following important ideas concerning Dynamic Programming are discussed

411 Principle of Optimality

Dynamic programming is a way of decomposing a large problem into subproblems

It can be applied to any problem that observes the principle of optimality

15

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 2: Models

On Stochastic Dynamic Programming and its

Application to Maintenance

MASTER THESIS BY FRANCcedilOIS BESNARD

Master Thesis written at the Royal Institute of TechnologyKTH School of Electrical Engineering June 2007

Supervisor Assistant Professor Lina Bertling (KTH) Professor Michael Patriksson(Chalmers Applied Mathematics) Dr Erik Dotzauer (Fortum)

Examiner Assistant Professor Lina Bertling

XR-EE-ETK 2007008

Abstract

The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of the power system The generat-ing companies as well as transmission and distribution system operators aim tominimize their costs Maintenance can be a significant part of the total costs Thepressure to reduce the maintenance budget leads to a need for efficient maintenance

This work focus on an optimization methodology that could be useful for optimizingmaintenance The method stochastic dynamic programming is interesting becauseit can integrate explicitely the stochastic behavior of functional failures

Different models based on stochastic dynamic programming are reviewed with thepossible optimization methods to solve them The interests of the models in the con-text of maintenance optimization are discussed An example on a multi-componentreplacement application is proposed to illustrate the theory

Keywords Maintenance Optimization Dynamic Programming Markov DecisionProcess Power Production

III

Acknowledgements

First of all I would like to thank my supervisors who each in their way supportedme in this work Ass Prof Lina Bertling for her encouragements constructiveremarks and for giving me the opportunity of working on this project Dr ErikDotzauer for many valuable inputs discussions and comments and Prof MichaelPatriksson for his help on mathematical writing

Special greetings to all my friends and companions of study all over the world

Finally my heart turns to my parents and my love for their endless encouragementsand support in my studies and life

Stockholm June 2007

V

Abreviations

ADP Approximate Dynamic ProgrammingCBM Condition Based MaintenanceCM Corrective MaintenanceDP Dynamic ProgrammingIHSDP Infinite Horizon Stochastic Dynamic ProgrammingLP Linear ProgrammingMDP Markov Decision ProcessPI Policy IterationPM Preventive MaintenanceRCAM Reliability Centered Asset MaintenanceRCM Reliability Centered MaintenanceSDP Stochastic Dynamic ProgrammingSMDP Semi-Markov Decision ProcessTBM Time Based MaintenanceVI Value Iteration

VII

Notations

NumbersM Number of iteration for the evaluation step of modified policy iterationN Number of stages

Constantα Discount factor ll

Variablesi State at the current stagej State at the next stagek Stagem Number of iteration left for the evaluation step of modified policy iterationq Iteration number for the policy iteration algorithmu Decision variable

State and Control Spacemicrok Function mapping the states with a decisionmicrolowastk(i) Optimal decision at state k for state imicro Decision policy for stationary systemsmicrolowast Optimal decision policy for stationary systemsπ Policyπlowast Optimal policyUk Decision action at stage kUlowastk (i) Optimal decision action at stage k for state iXk State at stage k

Dynamic and Cost functionsCk(i u) Cost functionCk(i u j) Cost functionCij(u) = C(i u j) Cost function if the system is stationaryCN (i) Terminal cost for state ifk(i u) Dynamic functionfk(i u ω) Stochastic dynamic functionJlowastk (i) Optimal cost-to-go from stage k to N starting from state iωk(i u) Probabilistic function of a disturbances Pk(j u i) Transition probability functionP (j u i) Transition probability function for stationary systemsV (Xk) Cost-to-go resulting of a trajectory starting from state Xk

Sets

IX

ΩUk (i) Decision Space at stage k for state iΩXk State space at stage k

Contents

Contents XI

1 Introduction 1

11 Background 1

12 Objective 2

13 Approach 2

14 Outline 2

2 Maintenance 5

21 Types of Maintenance 5

22 Maintenance Optimization Models 6

3 Introduction to the Power System 11

31 Power System Presentation 11

32 Costs 13

33 Main Constraints 13

4 Introduction to Dynamic Programming 15

41 Introduction 15

42 Deterministic Dynamic Programming 18

5 Finite Horizon Models 23

51 Problem Formulation 23

52 Optimality Equation 25

53 Value Iteration Method 25

54 The Curse of Dimensionality 26

55 Ideas for a Maintenance Optimization Model 26

6 Infinite Horizon Models - Markov Decision Processes 29

61 Problem Formulation 29

62 Optimality Equations 31

63 Value Iteration 31

64 The Policy Iteration Algorithm 31

65 Modified Policy Iteration 32

66 Average Cost-to-go Problems 33

XI

67 Linear Programming 3468 Efficiency of the Algorithms 3569 Semi-Markov Decision Process 35

7 Approximate Methods for Markov Decision Process - Reinforcement Learning 3771 Introduction 3772 Direct Learning 3873 Indirect Learning 4174 Supervised Learning 42

8 Review of Models for Maintenance Optimization 4381 Finite Horizon Dynamic Programming 4382 Infinite Horizon Stochastic Models 4483 Reinforcement Learning 4584 Conclusions 45

9 A Proposed Finite Horizon Replacement Model 4791 One-Component Model 4792 Multi-Component model 5593 Possible Extensions 59

10 Conclusions and Future Work 61

A Solution of the Shortest Path Example 63

Reference List 65

Chapter 1

Introduction

11 Background

The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of modern power system The gen-erating companies as well as transmission and distribution system operators aimto minimize their costs Maintenance costs can be a significant part of the totalcosts The pressure to reduce the maintenance budget leads to a need for efficientmaintenance

Maintenance cost be divided into Corrective Maintenance (CM) and PreventiveMaintenance (PM) (see Chapter 21)

CM means that an asset is maintained once an unscheduled functionnal failureoccurs CM can imply high costs for unsupplied energy interruption possible de-terioration of the system human risks or environment consequences etc

PM is employed to reduce the risk of unexpected failure Time Based Maintenance(TBM) is used for the most critical components and Condition Based Maintenance(CBM) for the components that are worth and not too expensive to monitoreThese maintenance actions have a cost for unsupplied energy inspection repairreplacement etc

An efficient maintenance should balance the corrective and preventive maintenanceto minimize the total costs of maintenance

The probability of a functionnal failure for a component is stochastic The probabil-ity depends on the state of component resulting from the history of the component(age intensity of use external stress (such as weather) maintenance actions human

1

errors and construction errors) Stochastic Dynamic Programming (SDP) modelsare optimization models that integrate explicitely stochastic behaviors This featuremakes the models interesting and was the starting idea of this work

12 Objective

The main objective of this work is to investigate the use of stochastic dynamicprogramming models for maintenance optimization and identify possible future ap-plications in power systems

13 Approach

The first task was to understand the different dynamic programming approachesA first distinction was made between finite horizon and infinite horizon approaches

The different techniques that can be used for solving a model based on dynamicprogramming was investigated For infinite horizon models approximate dynamicprogramming was studied These types of methods are related to the field of rein-forcement learning

Some SDP models found in the literature was reviewed Conclusions was madeabout the applicability of each approach for maintenance optimization problemsMoreover future avenue for research was identified

A finite horizon replacement model was developed to illustrate the possible use ofSDP for power system maintenance

14 Outline

Chapter 2 solves an overview of the maintenance field The most important methodsand some optimization models are reviewed

Chapter 3 discusses shortly power systems Some costs and constraints for opti-mization models are proposed

Chapter 4-7 focus on different Dynamic Programming (DP) approaches and al-gorithms to solve them The assumption of the models and practical limitationsare discussed The basic of DP models is investigated in deterministic models inChapter 4 Chapter 5 and 6 focus on Stochastic Dynamic Programming methods

2

respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods

Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization

Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization

Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch

3

Chapter 2

Maintenance

The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22

21 Types of Maintenance

Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance

Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure

Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance

1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established

5

Maintenance

Preventive Maintenance

Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)

Continuous Schedulled Inspection Based

Corrective Maintenance

Figure 21 Maintenance Tree based on [1]

2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures

22 Maintenance Optimization Models

Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost

The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance

Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems

In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]

6

221 Age Replacement Policies

Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model

A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age

A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured

An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement

222 Block Replacement Policies

In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT

This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution

223 Condition Based Maintenance

CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components

One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement

7

of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables

For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection

An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)

224 Opportunistic Maintenance Models

Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved

Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components

A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models

225 Other Types of Models and Criteria of Classifications

Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system

Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model

8

is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used

The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions

9

Chapter 3

Introduction to the Power

System

This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed

31 Power System Presentation

Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents

311 Power System Description

A simple description of the power system include the following main parts

1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated

2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units

11

3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)

4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage

The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition

The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs

312 Maintenance in Power System

The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined

Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])

Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more

12

attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)

The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems

32 Costs

Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows

bull Manpower cost Cost for the maintenance team that performs maintenanceactions

bull Spare part cost The cost of a new component is an important part of themaintenance cost

bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine

bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency

bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure

bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)

33 Main Constraints

Possibles constraints for the maintenance of power system have been identified asfollows

13

bull Manpower The size and availability of the maintenance staff is limited

bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available

bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms

bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time

bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model

bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model

bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model

14

Chapter 4

Introduction to Dynamic

Programming

This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels

41 Introduction

Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system

The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions

In the following important ideas concerning Dynamic Programming are discussed

411 Principle of Optimality

Dynamic programming is a way of decomposing a large problem into subproblems

It can be applied to any problem that observes the principle of optimality

15

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 3: Models

Abstract

The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of the power system The generat-ing companies as well as transmission and distribution system operators aim tominimize their costs Maintenance can be a significant part of the total costs Thepressure to reduce the maintenance budget leads to a need for efficient maintenance

This work focus on an optimization methodology that could be useful for optimizingmaintenance The method stochastic dynamic programming is interesting becauseit can integrate explicitely the stochastic behavior of functional failures

Different models based on stochastic dynamic programming are reviewed with thepossible optimization methods to solve them The interests of the models in the con-text of maintenance optimization are discussed An example on a multi-componentreplacement application is proposed to illustrate the theory

Keywords Maintenance Optimization Dynamic Programming Markov DecisionProcess Power Production

III

Acknowledgements

First of all I would like to thank my supervisors who each in their way supportedme in this work Ass Prof Lina Bertling for her encouragements constructiveremarks and for giving me the opportunity of working on this project Dr ErikDotzauer for many valuable inputs discussions and comments and Prof MichaelPatriksson for his help on mathematical writing

Special greetings to all my friends and companions of study all over the world

Finally my heart turns to my parents and my love for their endless encouragementsand support in my studies and life

Stockholm June 2007

V

Abreviations

ADP Approximate Dynamic ProgrammingCBM Condition Based MaintenanceCM Corrective MaintenanceDP Dynamic ProgrammingIHSDP Infinite Horizon Stochastic Dynamic ProgrammingLP Linear ProgrammingMDP Markov Decision ProcessPI Policy IterationPM Preventive MaintenanceRCAM Reliability Centered Asset MaintenanceRCM Reliability Centered MaintenanceSDP Stochastic Dynamic ProgrammingSMDP Semi-Markov Decision ProcessTBM Time Based MaintenanceVI Value Iteration

VII

Notations

NumbersM Number of iteration for the evaluation step of modified policy iterationN Number of stages

Constantα Discount factor ll

Variablesi State at the current stagej State at the next stagek Stagem Number of iteration left for the evaluation step of modified policy iterationq Iteration number for the policy iteration algorithmu Decision variable

State and Control Spacemicrok Function mapping the states with a decisionmicrolowastk(i) Optimal decision at state k for state imicro Decision policy for stationary systemsmicrolowast Optimal decision policy for stationary systemsπ Policyπlowast Optimal policyUk Decision action at stage kUlowastk (i) Optimal decision action at stage k for state iXk State at stage k

Dynamic and Cost functionsCk(i u) Cost functionCk(i u j) Cost functionCij(u) = C(i u j) Cost function if the system is stationaryCN (i) Terminal cost for state ifk(i u) Dynamic functionfk(i u ω) Stochastic dynamic functionJlowastk (i) Optimal cost-to-go from stage k to N starting from state iωk(i u) Probabilistic function of a disturbances Pk(j u i) Transition probability functionP (j u i) Transition probability function for stationary systemsV (Xk) Cost-to-go resulting of a trajectory starting from state Xk

Sets

IX

ΩUk (i) Decision Space at stage k for state iΩXk State space at stage k

Contents

Contents XI

1 Introduction 1

11 Background 1

12 Objective 2

13 Approach 2

14 Outline 2

2 Maintenance 5

21 Types of Maintenance 5

22 Maintenance Optimization Models 6

3 Introduction to the Power System 11

31 Power System Presentation 11

32 Costs 13

33 Main Constraints 13

4 Introduction to Dynamic Programming 15

41 Introduction 15

42 Deterministic Dynamic Programming 18

5 Finite Horizon Models 23

51 Problem Formulation 23

52 Optimality Equation 25

53 Value Iteration Method 25

54 The Curse of Dimensionality 26

55 Ideas for a Maintenance Optimization Model 26

6 Infinite Horizon Models - Markov Decision Processes 29

61 Problem Formulation 29

62 Optimality Equations 31

63 Value Iteration 31

64 The Policy Iteration Algorithm 31

65 Modified Policy Iteration 32

66 Average Cost-to-go Problems 33

XI

67 Linear Programming 3468 Efficiency of the Algorithms 3569 Semi-Markov Decision Process 35

7 Approximate Methods for Markov Decision Process - Reinforcement Learning 3771 Introduction 3772 Direct Learning 3873 Indirect Learning 4174 Supervised Learning 42

8 Review of Models for Maintenance Optimization 4381 Finite Horizon Dynamic Programming 4382 Infinite Horizon Stochastic Models 4483 Reinforcement Learning 4584 Conclusions 45

9 A Proposed Finite Horizon Replacement Model 4791 One-Component Model 4792 Multi-Component model 5593 Possible Extensions 59

10 Conclusions and Future Work 61

A Solution of the Shortest Path Example 63

Reference List 65

Chapter 1

Introduction

11 Background

The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of modern power system The gen-erating companies as well as transmission and distribution system operators aimto minimize their costs Maintenance costs can be a significant part of the totalcosts The pressure to reduce the maintenance budget leads to a need for efficientmaintenance

Maintenance cost be divided into Corrective Maintenance (CM) and PreventiveMaintenance (PM) (see Chapter 21)

CM means that an asset is maintained once an unscheduled functionnal failureoccurs CM can imply high costs for unsupplied energy interruption possible de-terioration of the system human risks or environment consequences etc

PM is employed to reduce the risk of unexpected failure Time Based Maintenance(TBM) is used for the most critical components and Condition Based Maintenance(CBM) for the components that are worth and not too expensive to monitoreThese maintenance actions have a cost for unsupplied energy inspection repairreplacement etc

An efficient maintenance should balance the corrective and preventive maintenanceto minimize the total costs of maintenance

The probability of a functionnal failure for a component is stochastic The probabil-ity depends on the state of component resulting from the history of the component(age intensity of use external stress (such as weather) maintenance actions human

1

errors and construction errors) Stochastic Dynamic Programming (SDP) modelsare optimization models that integrate explicitely stochastic behaviors This featuremakes the models interesting and was the starting idea of this work

12 Objective

The main objective of this work is to investigate the use of stochastic dynamicprogramming models for maintenance optimization and identify possible future ap-plications in power systems

13 Approach

The first task was to understand the different dynamic programming approachesA first distinction was made between finite horizon and infinite horizon approaches

The different techniques that can be used for solving a model based on dynamicprogramming was investigated For infinite horizon models approximate dynamicprogramming was studied These types of methods are related to the field of rein-forcement learning

Some SDP models found in the literature was reviewed Conclusions was madeabout the applicability of each approach for maintenance optimization problemsMoreover future avenue for research was identified

A finite horizon replacement model was developed to illustrate the possible use ofSDP for power system maintenance

14 Outline

Chapter 2 solves an overview of the maintenance field The most important methodsand some optimization models are reviewed

Chapter 3 discusses shortly power systems Some costs and constraints for opti-mization models are proposed

Chapter 4-7 focus on different Dynamic Programming (DP) approaches and al-gorithms to solve them The assumption of the models and practical limitationsare discussed The basic of DP models is investigated in deterministic models inChapter 4 Chapter 5 and 6 focus on Stochastic Dynamic Programming methods

2

respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods

Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization

Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization

Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch

3

Chapter 2

Maintenance

The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22

21 Types of Maintenance

Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance

Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure

Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance

1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established

5

Maintenance

Preventive Maintenance

Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)

Continuous Schedulled Inspection Based

Corrective Maintenance

Figure 21 Maintenance Tree based on [1]

2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures

22 Maintenance Optimization Models

Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost

The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance

Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems

In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]

6

221 Age Replacement Policies

Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model

A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age

A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured

An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement

222 Block Replacement Policies

In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT

This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution

223 Condition Based Maintenance

CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components

One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement

7

of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables

For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection

An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)

224 Opportunistic Maintenance Models

Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved

Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components

A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models

225 Other Types of Models and Criteria of Classifications

Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system

Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model

8

is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used

The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions

9

Chapter 3

Introduction to the Power

System

This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed

31 Power System Presentation

Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents

311 Power System Description

A simple description of the power system include the following main parts

1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated

2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units

11

3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)

4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage

The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition

The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs

312 Maintenance in Power System

The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined

Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])

Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more

12

attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)

The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems

32 Costs

Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows

bull Manpower cost Cost for the maintenance team that performs maintenanceactions

bull Spare part cost The cost of a new component is an important part of themaintenance cost

bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine

bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency

bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure

bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)

33 Main Constraints

Possibles constraints for the maintenance of power system have been identified asfollows

13

bull Manpower The size and availability of the maintenance staff is limited

bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available

bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms

bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time

bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model

bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model

bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model

14

Chapter 4

Introduction to Dynamic

Programming

This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels

41 Introduction

Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system

The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions

In the following important ideas concerning Dynamic Programming are discussed

411 Principle of Optimality

Dynamic programming is a way of decomposing a large problem into subproblems

It can be applied to any problem that observes the principle of optimality

15

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 4: Models

Acknowledgements

First of all I would like to thank my supervisors who each in their way supportedme in this work Ass Prof Lina Bertling for her encouragements constructiveremarks and for giving me the opportunity of working on this project Dr ErikDotzauer for many valuable inputs discussions and comments and Prof MichaelPatriksson for his help on mathematical writing

Special greetings to all my friends and companions of study all over the world

Finally my heart turns to my parents and my love for their endless encouragementsand support in my studies and life

Stockholm June 2007

V

Abreviations

ADP Approximate Dynamic ProgrammingCBM Condition Based MaintenanceCM Corrective MaintenanceDP Dynamic ProgrammingIHSDP Infinite Horizon Stochastic Dynamic ProgrammingLP Linear ProgrammingMDP Markov Decision ProcessPI Policy IterationPM Preventive MaintenanceRCAM Reliability Centered Asset MaintenanceRCM Reliability Centered MaintenanceSDP Stochastic Dynamic ProgrammingSMDP Semi-Markov Decision ProcessTBM Time Based MaintenanceVI Value Iteration

VII

Notations

NumbersM Number of iteration for the evaluation step of modified policy iterationN Number of stages

Constantα Discount factor ll

Variablesi State at the current stagej State at the next stagek Stagem Number of iteration left for the evaluation step of modified policy iterationq Iteration number for the policy iteration algorithmu Decision variable

State and Control Spacemicrok Function mapping the states with a decisionmicrolowastk(i) Optimal decision at state k for state imicro Decision policy for stationary systemsmicrolowast Optimal decision policy for stationary systemsπ Policyπlowast Optimal policyUk Decision action at stage kUlowastk (i) Optimal decision action at stage k for state iXk State at stage k

Dynamic and Cost functionsCk(i u) Cost functionCk(i u j) Cost functionCij(u) = C(i u j) Cost function if the system is stationaryCN (i) Terminal cost for state ifk(i u) Dynamic functionfk(i u ω) Stochastic dynamic functionJlowastk (i) Optimal cost-to-go from stage k to N starting from state iωk(i u) Probabilistic function of a disturbances Pk(j u i) Transition probability functionP (j u i) Transition probability function for stationary systemsV (Xk) Cost-to-go resulting of a trajectory starting from state Xk

Sets

IX

ΩUk (i) Decision Space at stage k for state iΩXk State space at stage k

Contents

Contents XI

1 Introduction 1

11 Background 1

12 Objective 2

13 Approach 2

14 Outline 2

2 Maintenance 5

21 Types of Maintenance 5

22 Maintenance Optimization Models 6

3 Introduction to the Power System 11

31 Power System Presentation 11

32 Costs 13

33 Main Constraints 13

4 Introduction to Dynamic Programming 15

41 Introduction 15

42 Deterministic Dynamic Programming 18

5 Finite Horizon Models 23

51 Problem Formulation 23

52 Optimality Equation 25

53 Value Iteration Method 25

54 The Curse of Dimensionality 26

55 Ideas for a Maintenance Optimization Model 26

6 Infinite Horizon Models - Markov Decision Processes 29

61 Problem Formulation 29

62 Optimality Equations 31

63 Value Iteration 31

64 The Policy Iteration Algorithm 31

65 Modified Policy Iteration 32

66 Average Cost-to-go Problems 33

XI

67 Linear Programming 3468 Efficiency of the Algorithms 3569 Semi-Markov Decision Process 35

7 Approximate Methods for Markov Decision Process - Reinforcement Learning 3771 Introduction 3772 Direct Learning 3873 Indirect Learning 4174 Supervised Learning 42

8 Review of Models for Maintenance Optimization 4381 Finite Horizon Dynamic Programming 4382 Infinite Horizon Stochastic Models 4483 Reinforcement Learning 4584 Conclusions 45

9 A Proposed Finite Horizon Replacement Model 4791 One-Component Model 4792 Multi-Component model 5593 Possible Extensions 59

10 Conclusions and Future Work 61

A Solution of the Shortest Path Example 63

Reference List 65

Chapter 1

Introduction

11 Background

The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of modern power system The gen-erating companies as well as transmission and distribution system operators aimto minimize their costs Maintenance costs can be a significant part of the totalcosts The pressure to reduce the maintenance budget leads to a need for efficientmaintenance

Maintenance cost be divided into Corrective Maintenance (CM) and PreventiveMaintenance (PM) (see Chapter 21)

CM means that an asset is maintained once an unscheduled functionnal failureoccurs CM can imply high costs for unsupplied energy interruption possible de-terioration of the system human risks or environment consequences etc

PM is employed to reduce the risk of unexpected failure Time Based Maintenance(TBM) is used for the most critical components and Condition Based Maintenance(CBM) for the components that are worth and not too expensive to monitoreThese maintenance actions have a cost for unsupplied energy inspection repairreplacement etc

An efficient maintenance should balance the corrective and preventive maintenanceto minimize the total costs of maintenance

The probability of a functionnal failure for a component is stochastic The probabil-ity depends on the state of component resulting from the history of the component(age intensity of use external stress (such as weather) maintenance actions human

1

errors and construction errors) Stochastic Dynamic Programming (SDP) modelsare optimization models that integrate explicitely stochastic behaviors This featuremakes the models interesting and was the starting idea of this work

12 Objective

The main objective of this work is to investigate the use of stochastic dynamicprogramming models for maintenance optimization and identify possible future ap-plications in power systems

13 Approach

The first task was to understand the different dynamic programming approachesA first distinction was made between finite horizon and infinite horizon approaches

The different techniques that can be used for solving a model based on dynamicprogramming was investigated For infinite horizon models approximate dynamicprogramming was studied These types of methods are related to the field of rein-forcement learning

Some SDP models found in the literature was reviewed Conclusions was madeabout the applicability of each approach for maintenance optimization problemsMoreover future avenue for research was identified

A finite horizon replacement model was developed to illustrate the possible use ofSDP for power system maintenance

14 Outline

Chapter 2 solves an overview of the maintenance field The most important methodsand some optimization models are reviewed

Chapter 3 discusses shortly power systems Some costs and constraints for opti-mization models are proposed

Chapter 4-7 focus on different Dynamic Programming (DP) approaches and al-gorithms to solve them The assumption of the models and practical limitationsare discussed The basic of DP models is investigated in deterministic models inChapter 4 Chapter 5 and 6 focus on Stochastic Dynamic Programming methods

2

respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods

Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization

Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization

Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch

3

Chapter 2

Maintenance

The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22

21 Types of Maintenance

Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance

Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure

Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance

1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established

5

Maintenance

Preventive Maintenance

Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)

Continuous Schedulled Inspection Based

Corrective Maintenance

Figure 21 Maintenance Tree based on [1]

2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures

22 Maintenance Optimization Models

Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost

The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance

Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems

In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]

6

221 Age Replacement Policies

Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model

A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age

A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured

An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement

222 Block Replacement Policies

In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT

This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution

223 Condition Based Maintenance

CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components

One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement

7

of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables

For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection

An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)

224 Opportunistic Maintenance Models

Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved

Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components

A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models

225 Other Types of Models and Criteria of Classifications

Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system

Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model

8

is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used

The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions

9

Chapter 3

Introduction to the Power

System

This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed

31 Power System Presentation

Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents

311 Power System Description

A simple description of the power system include the following main parts

1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated

2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units

11

3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)

4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage

The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition

The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs

312 Maintenance in Power System

The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined

Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])

Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more

12

attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)

The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems

32 Costs

Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows

bull Manpower cost Cost for the maintenance team that performs maintenanceactions

bull Spare part cost The cost of a new component is an important part of themaintenance cost

bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine

bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency

bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure

bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)

33 Main Constraints

Possibles constraints for the maintenance of power system have been identified asfollows

13

bull Manpower The size and availability of the maintenance staff is limited

bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available

bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms

bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time

bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model

bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model

bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model

14

Chapter 4

Introduction to Dynamic

Programming

This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels

41 Introduction

Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system

The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions

In the following important ideas concerning Dynamic Programming are discussed

411 Principle of Optimality

Dynamic programming is a way of decomposing a large problem into subproblems

It can be applied to any problem that observes the principle of optimality

15

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 5: Models

Abreviations

ADP Approximate Dynamic ProgrammingCBM Condition Based MaintenanceCM Corrective MaintenanceDP Dynamic ProgrammingIHSDP Infinite Horizon Stochastic Dynamic ProgrammingLP Linear ProgrammingMDP Markov Decision ProcessPI Policy IterationPM Preventive MaintenanceRCAM Reliability Centered Asset MaintenanceRCM Reliability Centered MaintenanceSDP Stochastic Dynamic ProgrammingSMDP Semi-Markov Decision ProcessTBM Time Based MaintenanceVI Value Iteration

VII

Notations

NumbersM Number of iteration for the evaluation step of modified policy iterationN Number of stages

Constantα Discount factor ll

Variablesi State at the current stagej State at the next stagek Stagem Number of iteration left for the evaluation step of modified policy iterationq Iteration number for the policy iteration algorithmu Decision variable

State and Control Spacemicrok Function mapping the states with a decisionmicrolowastk(i) Optimal decision at state k for state imicro Decision policy for stationary systemsmicrolowast Optimal decision policy for stationary systemsπ Policyπlowast Optimal policyUk Decision action at stage kUlowastk (i) Optimal decision action at stage k for state iXk State at stage k

Dynamic and Cost functionsCk(i u) Cost functionCk(i u j) Cost functionCij(u) = C(i u j) Cost function if the system is stationaryCN (i) Terminal cost for state ifk(i u) Dynamic functionfk(i u ω) Stochastic dynamic functionJlowastk (i) Optimal cost-to-go from stage k to N starting from state iωk(i u) Probabilistic function of a disturbances Pk(j u i) Transition probability functionP (j u i) Transition probability function for stationary systemsV (Xk) Cost-to-go resulting of a trajectory starting from state Xk

Sets

IX

ΩUk (i) Decision Space at stage k for state iΩXk State space at stage k

Contents

Contents XI

1 Introduction 1

11 Background 1

12 Objective 2

13 Approach 2

14 Outline 2

2 Maintenance 5

21 Types of Maintenance 5

22 Maintenance Optimization Models 6

3 Introduction to the Power System 11

31 Power System Presentation 11

32 Costs 13

33 Main Constraints 13

4 Introduction to Dynamic Programming 15

41 Introduction 15

42 Deterministic Dynamic Programming 18

5 Finite Horizon Models 23

51 Problem Formulation 23

52 Optimality Equation 25

53 Value Iteration Method 25

54 The Curse of Dimensionality 26

55 Ideas for a Maintenance Optimization Model 26

6 Infinite Horizon Models - Markov Decision Processes 29

61 Problem Formulation 29

62 Optimality Equations 31

63 Value Iteration 31

64 The Policy Iteration Algorithm 31

65 Modified Policy Iteration 32

66 Average Cost-to-go Problems 33

XI

67 Linear Programming 3468 Efficiency of the Algorithms 3569 Semi-Markov Decision Process 35

7 Approximate Methods for Markov Decision Process - Reinforcement Learning 3771 Introduction 3772 Direct Learning 3873 Indirect Learning 4174 Supervised Learning 42

8 Review of Models for Maintenance Optimization 4381 Finite Horizon Dynamic Programming 4382 Infinite Horizon Stochastic Models 4483 Reinforcement Learning 4584 Conclusions 45

9 A Proposed Finite Horizon Replacement Model 4791 One-Component Model 4792 Multi-Component model 5593 Possible Extensions 59

10 Conclusions and Future Work 61

A Solution of the Shortest Path Example 63

Reference List 65

Chapter 1

Introduction

11 Background

The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of modern power system The gen-erating companies as well as transmission and distribution system operators aimto minimize their costs Maintenance costs can be a significant part of the totalcosts The pressure to reduce the maintenance budget leads to a need for efficientmaintenance

Maintenance cost be divided into Corrective Maintenance (CM) and PreventiveMaintenance (PM) (see Chapter 21)

CM means that an asset is maintained once an unscheduled functionnal failureoccurs CM can imply high costs for unsupplied energy interruption possible de-terioration of the system human risks or environment consequences etc

PM is employed to reduce the risk of unexpected failure Time Based Maintenance(TBM) is used for the most critical components and Condition Based Maintenance(CBM) for the components that are worth and not too expensive to monitoreThese maintenance actions have a cost for unsupplied energy inspection repairreplacement etc

An efficient maintenance should balance the corrective and preventive maintenanceto minimize the total costs of maintenance

The probability of a functionnal failure for a component is stochastic The probabil-ity depends on the state of component resulting from the history of the component(age intensity of use external stress (such as weather) maintenance actions human

1

errors and construction errors) Stochastic Dynamic Programming (SDP) modelsare optimization models that integrate explicitely stochastic behaviors This featuremakes the models interesting and was the starting idea of this work

12 Objective

The main objective of this work is to investigate the use of stochastic dynamicprogramming models for maintenance optimization and identify possible future ap-plications in power systems

13 Approach

The first task was to understand the different dynamic programming approachesA first distinction was made between finite horizon and infinite horizon approaches

The different techniques that can be used for solving a model based on dynamicprogramming was investigated For infinite horizon models approximate dynamicprogramming was studied These types of methods are related to the field of rein-forcement learning

Some SDP models found in the literature was reviewed Conclusions was madeabout the applicability of each approach for maintenance optimization problemsMoreover future avenue for research was identified

A finite horizon replacement model was developed to illustrate the possible use ofSDP for power system maintenance

14 Outline

Chapter 2 solves an overview of the maintenance field The most important methodsand some optimization models are reviewed

Chapter 3 discusses shortly power systems Some costs and constraints for opti-mization models are proposed

Chapter 4-7 focus on different Dynamic Programming (DP) approaches and al-gorithms to solve them The assumption of the models and practical limitationsare discussed The basic of DP models is investigated in deterministic models inChapter 4 Chapter 5 and 6 focus on Stochastic Dynamic Programming methods

2

respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods

Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization

Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization

Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch

3

Chapter 2

Maintenance

The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22

21 Types of Maintenance

Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance

Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure

Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance

1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established

5

Maintenance

Preventive Maintenance

Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)

Continuous Schedulled Inspection Based

Corrective Maintenance

Figure 21 Maintenance Tree based on [1]

2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures

22 Maintenance Optimization Models

Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost

The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance

Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems

In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]

6

221 Age Replacement Policies

Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model

A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age

A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured

An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement

222 Block Replacement Policies

In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT

This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution

223 Condition Based Maintenance

CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components

One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement

7

of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables

For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection

An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)

224 Opportunistic Maintenance Models

Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved

Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components

A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models

225 Other Types of Models and Criteria of Classifications

Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system

Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model

8

is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used

The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions

9

Chapter 3

Introduction to the Power

System

This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed

31 Power System Presentation

Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents

311 Power System Description

A simple description of the power system include the following main parts

1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated

2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units

11

3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)

4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage

The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition

The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs

312 Maintenance in Power System

The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined

Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])

Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more

12

attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)

The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems

32 Costs

Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows

bull Manpower cost Cost for the maintenance team that performs maintenanceactions

bull Spare part cost The cost of a new component is an important part of themaintenance cost

bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine

bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency

bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure

bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)

33 Main Constraints

Possibles constraints for the maintenance of power system have been identified asfollows

13

bull Manpower The size and availability of the maintenance staff is limited

bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available

bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms

bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time

bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model

bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model

bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model

14

Chapter 4

Introduction to Dynamic

Programming

This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels

41 Introduction

Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system

The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions

In the following important ideas concerning Dynamic Programming are discussed

411 Principle of Optimality

Dynamic programming is a way of decomposing a large problem into subproblems

It can be applied to any problem that observes the principle of optimality

15

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 6: Models

Notations

NumbersM Number of iteration for the evaluation step of modified policy iterationN Number of stages

Constantα Discount factor ll

Variablesi State at the current stagej State at the next stagek Stagem Number of iteration left for the evaluation step of modified policy iterationq Iteration number for the policy iteration algorithmu Decision variable

State and Control Spacemicrok Function mapping the states with a decisionmicrolowastk(i) Optimal decision at state k for state imicro Decision policy for stationary systemsmicrolowast Optimal decision policy for stationary systemsπ Policyπlowast Optimal policyUk Decision action at stage kUlowastk (i) Optimal decision action at stage k for state iXk State at stage k

Dynamic and Cost functionsCk(i u) Cost functionCk(i u j) Cost functionCij(u) = C(i u j) Cost function if the system is stationaryCN (i) Terminal cost for state ifk(i u) Dynamic functionfk(i u ω) Stochastic dynamic functionJlowastk (i) Optimal cost-to-go from stage k to N starting from state iωk(i u) Probabilistic function of a disturbances Pk(j u i) Transition probability functionP (j u i) Transition probability function for stationary systemsV (Xk) Cost-to-go resulting of a trajectory starting from state Xk

Sets

IX

ΩUk (i) Decision Space at stage k for state iΩXk State space at stage k

Contents

Contents XI

1 Introduction 1

11 Background 1

12 Objective 2

13 Approach 2

14 Outline 2

2 Maintenance 5

21 Types of Maintenance 5

22 Maintenance Optimization Models 6

3 Introduction to the Power System 11

31 Power System Presentation 11

32 Costs 13

33 Main Constraints 13

4 Introduction to Dynamic Programming 15

41 Introduction 15

42 Deterministic Dynamic Programming 18

5 Finite Horizon Models 23

51 Problem Formulation 23

52 Optimality Equation 25

53 Value Iteration Method 25

54 The Curse of Dimensionality 26

55 Ideas for a Maintenance Optimization Model 26

6 Infinite Horizon Models - Markov Decision Processes 29

61 Problem Formulation 29

62 Optimality Equations 31

63 Value Iteration 31

64 The Policy Iteration Algorithm 31

65 Modified Policy Iteration 32

66 Average Cost-to-go Problems 33

XI

67 Linear Programming 3468 Efficiency of the Algorithms 3569 Semi-Markov Decision Process 35

7 Approximate Methods for Markov Decision Process - Reinforcement Learning 3771 Introduction 3772 Direct Learning 3873 Indirect Learning 4174 Supervised Learning 42

8 Review of Models for Maintenance Optimization 4381 Finite Horizon Dynamic Programming 4382 Infinite Horizon Stochastic Models 4483 Reinforcement Learning 4584 Conclusions 45

9 A Proposed Finite Horizon Replacement Model 4791 One-Component Model 4792 Multi-Component model 5593 Possible Extensions 59

10 Conclusions and Future Work 61

A Solution of the Shortest Path Example 63

Reference List 65

Chapter 1

Introduction

11 Background

The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of modern power system The gen-erating companies as well as transmission and distribution system operators aimto minimize their costs Maintenance costs can be a significant part of the totalcosts The pressure to reduce the maintenance budget leads to a need for efficientmaintenance

Maintenance cost be divided into Corrective Maintenance (CM) and PreventiveMaintenance (PM) (see Chapter 21)

CM means that an asset is maintained once an unscheduled functionnal failureoccurs CM can imply high costs for unsupplied energy interruption possible de-terioration of the system human risks or environment consequences etc

PM is employed to reduce the risk of unexpected failure Time Based Maintenance(TBM) is used for the most critical components and Condition Based Maintenance(CBM) for the components that are worth and not too expensive to monitoreThese maintenance actions have a cost for unsupplied energy inspection repairreplacement etc

An efficient maintenance should balance the corrective and preventive maintenanceto minimize the total costs of maintenance

The probability of a functionnal failure for a component is stochastic The probabil-ity depends on the state of component resulting from the history of the component(age intensity of use external stress (such as weather) maintenance actions human

1

errors and construction errors) Stochastic Dynamic Programming (SDP) modelsare optimization models that integrate explicitely stochastic behaviors This featuremakes the models interesting and was the starting idea of this work

12 Objective

The main objective of this work is to investigate the use of stochastic dynamicprogramming models for maintenance optimization and identify possible future ap-plications in power systems

13 Approach

The first task was to understand the different dynamic programming approachesA first distinction was made between finite horizon and infinite horizon approaches

The different techniques that can be used for solving a model based on dynamicprogramming was investigated For infinite horizon models approximate dynamicprogramming was studied These types of methods are related to the field of rein-forcement learning

Some SDP models found in the literature was reviewed Conclusions was madeabout the applicability of each approach for maintenance optimization problemsMoreover future avenue for research was identified

A finite horizon replacement model was developed to illustrate the possible use ofSDP for power system maintenance

14 Outline

Chapter 2 solves an overview of the maintenance field The most important methodsand some optimization models are reviewed

Chapter 3 discusses shortly power systems Some costs and constraints for opti-mization models are proposed

Chapter 4-7 focus on different Dynamic Programming (DP) approaches and al-gorithms to solve them The assumption of the models and practical limitationsare discussed The basic of DP models is investigated in deterministic models inChapter 4 Chapter 5 and 6 focus on Stochastic Dynamic Programming methods

2

respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods

Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization

Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization

Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch

3

Chapter 2

Maintenance

The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22

21 Types of Maintenance

Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance

Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure

Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance

1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established

5

Maintenance

Preventive Maintenance

Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)

Continuous Schedulled Inspection Based

Corrective Maintenance

Figure 21 Maintenance Tree based on [1]

2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures

22 Maintenance Optimization Models

Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost

The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance

Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems

In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]

6

221 Age Replacement Policies

Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model

A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age

A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured

An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement

222 Block Replacement Policies

In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT

This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution

223 Condition Based Maintenance

CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components

One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement

7

of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables

For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection

An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)

224 Opportunistic Maintenance Models

Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved

Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components

A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models

225 Other Types of Models and Criteria of Classifications

Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system

Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model

8

is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used

The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions

9

Chapter 3

Introduction to the Power

System

This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed

31 Power System Presentation

Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents

311 Power System Description

A simple description of the power system include the following main parts

1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated

2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units

11

3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)

4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage

The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition

The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs

312 Maintenance in Power System

The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined

Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])

Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more

12

attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)

The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems

32 Costs

Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows

bull Manpower cost Cost for the maintenance team that performs maintenanceactions

bull Spare part cost The cost of a new component is an important part of themaintenance cost

bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine

bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency

bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure

bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)

33 Main Constraints

Possibles constraints for the maintenance of power system have been identified asfollows

13

bull Manpower The size and availability of the maintenance staff is limited

bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available

bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms

bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time

bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model

bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model

bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model

14

Chapter 4

Introduction to Dynamic

Programming

This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels

41 Introduction

Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system

The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions

In the following important ideas concerning Dynamic Programming are discussed

411 Principle of Optimality

Dynamic programming is a way of decomposing a large problem into subproblems

It can be applied to any problem that observes the principle of optimality

15

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 7: Models

ΩUk (i) Decision Space at stage k for state iΩXk State space at stage k

Contents

Contents XI

1 Introduction 1

11 Background 1

12 Objective 2

13 Approach 2

14 Outline 2

2 Maintenance 5

21 Types of Maintenance 5

22 Maintenance Optimization Models 6

3 Introduction to the Power System 11

31 Power System Presentation 11

32 Costs 13

33 Main Constraints 13

4 Introduction to Dynamic Programming 15

41 Introduction 15

42 Deterministic Dynamic Programming 18

5 Finite Horizon Models 23

51 Problem Formulation 23

52 Optimality Equation 25

53 Value Iteration Method 25

54 The Curse of Dimensionality 26

55 Ideas for a Maintenance Optimization Model 26

6 Infinite Horizon Models - Markov Decision Processes 29

61 Problem Formulation 29

62 Optimality Equations 31

63 Value Iteration 31

64 The Policy Iteration Algorithm 31

65 Modified Policy Iteration 32

66 Average Cost-to-go Problems 33

XI

67 Linear Programming 3468 Efficiency of the Algorithms 3569 Semi-Markov Decision Process 35

7 Approximate Methods for Markov Decision Process - Reinforcement Learning 3771 Introduction 3772 Direct Learning 3873 Indirect Learning 4174 Supervised Learning 42

8 Review of Models for Maintenance Optimization 4381 Finite Horizon Dynamic Programming 4382 Infinite Horizon Stochastic Models 4483 Reinforcement Learning 4584 Conclusions 45

9 A Proposed Finite Horizon Replacement Model 4791 One-Component Model 4792 Multi-Component model 5593 Possible Extensions 59

10 Conclusions and Future Work 61

A Solution of the Shortest Path Example 63

Reference List 65

Chapter 1

Introduction

11 Background

The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of modern power system The gen-erating companies as well as transmission and distribution system operators aimto minimize their costs Maintenance costs can be a significant part of the totalcosts The pressure to reduce the maintenance budget leads to a need for efficientmaintenance

Maintenance cost be divided into Corrective Maintenance (CM) and PreventiveMaintenance (PM) (see Chapter 21)

CM means that an asset is maintained once an unscheduled functionnal failureoccurs CM can imply high costs for unsupplied energy interruption possible de-terioration of the system human risks or environment consequences etc

PM is employed to reduce the risk of unexpected failure Time Based Maintenance(TBM) is used for the most critical components and Condition Based Maintenance(CBM) for the components that are worth and not too expensive to monitoreThese maintenance actions have a cost for unsupplied energy inspection repairreplacement etc

An efficient maintenance should balance the corrective and preventive maintenanceto minimize the total costs of maintenance

The probability of a functionnal failure for a component is stochastic The probabil-ity depends on the state of component resulting from the history of the component(age intensity of use external stress (such as weather) maintenance actions human

1

errors and construction errors) Stochastic Dynamic Programming (SDP) modelsare optimization models that integrate explicitely stochastic behaviors This featuremakes the models interesting and was the starting idea of this work

12 Objective

The main objective of this work is to investigate the use of stochastic dynamicprogramming models for maintenance optimization and identify possible future ap-plications in power systems

13 Approach

The first task was to understand the different dynamic programming approachesA first distinction was made between finite horizon and infinite horizon approaches

The different techniques that can be used for solving a model based on dynamicprogramming was investigated For infinite horizon models approximate dynamicprogramming was studied These types of methods are related to the field of rein-forcement learning

Some SDP models found in the literature was reviewed Conclusions was madeabout the applicability of each approach for maintenance optimization problemsMoreover future avenue for research was identified

A finite horizon replacement model was developed to illustrate the possible use ofSDP for power system maintenance

14 Outline

Chapter 2 solves an overview of the maintenance field The most important methodsand some optimization models are reviewed

Chapter 3 discusses shortly power systems Some costs and constraints for opti-mization models are proposed

Chapter 4-7 focus on different Dynamic Programming (DP) approaches and al-gorithms to solve them The assumption of the models and practical limitationsare discussed The basic of DP models is investigated in deterministic models inChapter 4 Chapter 5 and 6 focus on Stochastic Dynamic Programming methods

2

respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods

Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization

Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization

Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch

3

Chapter 2

Maintenance

The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22

21 Types of Maintenance

Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance

Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure

Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance

1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established

5

Maintenance

Preventive Maintenance

Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)

Continuous Schedulled Inspection Based

Corrective Maintenance

Figure 21 Maintenance Tree based on [1]

2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures

22 Maintenance Optimization Models

Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost

The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance

Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems

In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]

6

221 Age Replacement Policies

Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model

A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age

A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured

An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement

222 Block Replacement Policies

In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT

This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution

223 Condition Based Maintenance

CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components

One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement

7

of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables

For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection

An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)

224 Opportunistic Maintenance Models

Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved

Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components

A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models

225 Other Types of Models and Criteria of Classifications

Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system

Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model

8

is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used

The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions

9

Chapter 3

Introduction to the Power

System

This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed

31 Power System Presentation

Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents

311 Power System Description

A simple description of the power system include the following main parts

1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated

2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units

11

3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)

4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage

The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition

The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs

312 Maintenance in Power System

The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined

Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])

Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more

12

attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)

The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems

32 Costs

Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows

bull Manpower cost Cost for the maintenance team that performs maintenanceactions

bull Spare part cost The cost of a new component is an important part of themaintenance cost

bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine

bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency

bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure

bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)

33 Main Constraints

Possibles constraints for the maintenance of power system have been identified asfollows

13

bull Manpower The size and availability of the maintenance staff is limited

bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available

bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms

bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time

bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model

bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model

bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model

14

Chapter 4

Introduction to Dynamic

Programming

This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels

41 Introduction

Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system

The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions

In the following important ideas concerning Dynamic Programming are discussed

411 Principle of Optimality

Dynamic programming is a way of decomposing a large problem into subproblems

It can be applied to any problem that observes the principle of optimality

15

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 8: Models

Contents

Contents XI

1 Introduction 1

11 Background 1

12 Objective 2

13 Approach 2

14 Outline 2

2 Maintenance 5

21 Types of Maintenance 5

22 Maintenance Optimization Models 6

3 Introduction to the Power System 11

31 Power System Presentation 11

32 Costs 13

33 Main Constraints 13

4 Introduction to Dynamic Programming 15

41 Introduction 15

42 Deterministic Dynamic Programming 18

5 Finite Horizon Models 23

51 Problem Formulation 23

52 Optimality Equation 25

53 Value Iteration Method 25

54 The Curse of Dimensionality 26

55 Ideas for a Maintenance Optimization Model 26

6 Infinite Horizon Models - Markov Decision Processes 29

61 Problem Formulation 29

62 Optimality Equations 31

63 Value Iteration 31

64 The Policy Iteration Algorithm 31

65 Modified Policy Iteration 32

66 Average Cost-to-go Problems 33

XI

67 Linear Programming 3468 Efficiency of the Algorithms 3569 Semi-Markov Decision Process 35

7 Approximate Methods for Markov Decision Process - Reinforcement Learning 3771 Introduction 3772 Direct Learning 3873 Indirect Learning 4174 Supervised Learning 42

8 Review of Models for Maintenance Optimization 4381 Finite Horizon Dynamic Programming 4382 Infinite Horizon Stochastic Models 4483 Reinforcement Learning 4584 Conclusions 45

9 A Proposed Finite Horizon Replacement Model 4791 One-Component Model 4792 Multi-Component model 5593 Possible Extensions 59

10 Conclusions and Future Work 61

A Solution of the Shortest Path Example 63

Reference List 65

Chapter 1

Introduction

11 Background

The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of modern power system The gen-erating companies as well as transmission and distribution system operators aimto minimize their costs Maintenance costs can be a significant part of the totalcosts The pressure to reduce the maintenance budget leads to a need for efficientmaintenance

Maintenance cost be divided into Corrective Maintenance (CM) and PreventiveMaintenance (PM) (see Chapter 21)

CM means that an asset is maintained once an unscheduled functionnal failureoccurs CM can imply high costs for unsupplied energy interruption possible de-terioration of the system human risks or environment consequences etc

PM is employed to reduce the risk of unexpected failure Time Based Maintenance(TBM) is used for the most critical components and Condition Based Maintenance(CBM) for the components that are worth and not too expensive to monitoreThese maintenance actions have a cost for unsupplied energy inspection repairreplacement etc

An efficient maintenance should balance the corrective and preventive maintenanceto minimize the total costs of maintenance

The probability of a functionnal failure for a component is stochastic The probabil-ity depends on the state of component resulting from the history of the component(age intensity of use external stress (such as weather) maintenance actions human

1

errors and construction errors) Stochastic Dynamic Programming (SDP) modelsare optimization models that integrate explicitely stochastic behaviors This featuremakes the models interesting and was the starting idea of this work

12 Objective

The main objective of this work is to investigate the use of stochastic dynamicprogramming models for maintenance optimization and identify possible future ap-plications in power systems

13 Approach

The first task was to understand the different dynamic programming approachesA first distinction was made between finite horizon and infinite horizon approaches

The different techniques that can be used for solving a model based on dynamicprogramming was investigated For infinite horizon models approximate dynamicprogramming was studied These types of methods are related to the field of rein-forcement learning

Some SDP models found in the literature was reviewed Conclusions was madeabout the applicability of each approach for maintenance optimization problemsMoreover future avenue for research was identified

A finite horizon replacement model was developed to illustrate the possible use ofSDP for power system maintenance

14 Outline

Chapter 2 solves an overview of the maintenance field The most important methodsand some optimization models are reviewed

Chapter 3 discusses shortly power systems Some costs and constraints for opti-mization models are proposed

Chapter 4-7 focus on different Dynamic Programming (DP) approaches and al-gorithms to solve them The assumption of the models and practical limitationsare discussed The basic of DP models is investigated in deterministic models inChapter 4 Chapter 5 and 6 focus on Stochastic Dynamic Programming methods

2

respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods

Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization

Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization

Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch

3

Chapter 2

Maintenance

The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22

21 Types of Maintenance

Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance

Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure

Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance

1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established

5

Maintenance

Preventive Maintenance

Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)

Continuous Schedulled Inspection Based

Corrective Maintenance

Figure 21 Maintenance Tree based on [1]

2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures

22 Maintenance Optimization Models

Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost

The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance

Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems

In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]

6

221 Age Replacement Policies

Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model

A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age

A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured

An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement

222 Block Replacement Policies

In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT

This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution

223 Condition Based Maintenance

CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components

One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement

7

of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables

For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection

An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)

224 Opportunistic Maintenance Models

Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved

Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components

A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models

225 Other Types of Models and Criteria of Classifications

Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system

Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model

8

is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used

The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions

9

Chapter 3

Introduction to the Power

System

This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed

31 Power System Presentation

Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents

311 Power System Description

A simple description of the power system include the following main parts

1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated

2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units

11

3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)

4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage

The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition

The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs

312 Maintenance in Power System

The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined

Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])

Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more

12

attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)

The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems

32 Costs

Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows

bull Manpower cost Cost for the maintenance team that performs maintenanceactions

bull Spare part cost The cost of a new component is an important part of themaintenance cost

bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine

bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency

bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure

bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)

33 Main Constraints

Possibles constraints for the maintenance of power system have been identified asfollows

13

bull Manpower The size and availability of the maintenance staff is limited

bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available

bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms

bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time

bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model

bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model

bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model

14

Chapter 4

Introduction to Dynamic

Programming

This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels

41 Introduction

Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system

The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions

In the following important ideas concerning Dynamic Programming are discussed

411 Principle of Optimality

Dynamic programming is a way of decomposing a large problem into subproblems

It can be applied to any problem that observes the principle of optimality

15

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 9: Models

67 Linear Programming 3468 Efficiency of the Algorithms 3569 Semi-Markov Decision Process 35

7 Approximate Methods for Markov Decision Process - Reinforcement Learning 3771 Introduction 3772 Direct Learning 3873 Indirect Learning 4174 Supervised Learning 42

8 Review of Models for Maintenance Optimization 4381 Finite Horizon Dynamic Programming 4382 Infinite Horizon Stochastic Models 4483 Reinforcement Learning 4584 Conclusions 45

9 A Proposed Finite Horizon Replacement Model 4791 One-Component Model 4792 Multi-Component model 5593 Possible Extensions 59

10 Conclusions and Future Work 61

A Solution of the Shortest Path Example 63

Reference List 65

Chapter 1

Introduction

11 Background

The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of modern power system The gen-erating companies as well as transmission and distribution system operators aimto minimize their costs Maintenance costs can be a significant part of the totalcosts The pressure to reduce the maintenance budget leads to a need for efficientmaintenance

Maintenance cost be divided into Corrective Maintenance (CM) and PreventiveMaintenance (PM) (see Chapter 21)

CM means that an asset is maintained once an unscheduled functionnal failureoccurs CM can imply high costs for unsupplied energy interruption possible de-terioration of the system human risks or environment consequences etc

PM is employed to reduce the risk of unexpected failure Time Based Maintenance(TBM) is used for the most critical components and Condition Based Maintenance(CBM) for the components that are worth and not too expensive to monitoreThese maintenance actions have a cost for unsupplied energy inspection repairreplacement etc

An efficient maintenance should balance the corrective and preventive maintenanceto minimize the total costs of maintenance

The probability of a functionnal failure for a component is stochastic The probabil-ity depends on the state of component resulting from the history of the component(age intensity of use external stress (such as weather) maintenance actions human

1

errors and construction errors) Stochastic Dynamic Programming (SDP) modelsare optimization models that integrate explicitely stochastic behaviors This featuremakes the models interesting and was the starting idea of this work

12 Objective

The main objective of this work is to investigate the use of stochastic dynamicprogramming models for maintenance optimization and identify possible future ap-plications in power systems

13 Approach

The first task was to understand the different dynamic programming approachesA first distinction was made between finite horizon and infinite horizon approaches

The different techniques that can be used for solving a model based on dynamicprogramming was investigated For infinite horizon models approximate dynamicprogramming was studied These types of methods are related to the field of rein-forcement learning

Some SDP models found in the literature was reviewed Conclusions was madeabout the applicability of each approach for maintenance optimization problemsMoreover future avenue for research was identified

A finite horizon replacement model was developed to illustrate the possible use ofSDP for power system maintenance

14 Outline

Chapter 2 solves an overview of the maintenance field The most important methodsand some optimization models are reviewed

Chapter 3 discusses shortly power systems Some costs and constraints for opti-mization models are proposed

Chapter 4-7 focus on different Dynamic Programming (DP) approaches and al-gorithms to solve them The assumption of the models and practical limitationsare discussed The basic of DP models is investigated in deterministic models inChapter 4 Chapter 5 and 6 focus on Stochastic Dynamic Programming methods

2

respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods

Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization

Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization

Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch

3

Chapter 2

Maintenance

The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22

21 Types of Maintenance

Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance

Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure

Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance

1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established

5

Maintenance

Preventive Maintenance

Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)

Continuous Schedulled Inspection Based

Corrective Maintenance

Figure 21 Maintenance Tree based on [1]

2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures

22 Maintenance Optimization Models

Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost

The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance

Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems

In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]

6

221 Age Replacement Policies

Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model

A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age

A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured

An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement

222 Block Replacement Policies

In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT

This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution

223 Condition Based Maintenance

CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components

One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement

7

of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables

For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection

An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)

224 Opportunistic Maintenance Models

Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved

Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components

A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models

225 Other Types of Models and Criteria of Classifications

Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system

Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model

8

is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used

The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions

9

Chapter 3

Introduction to the Power

System

This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed

31 Power System Presentation

Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents

311 Power System Description

A simple description of the power system include the following main parts

1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated

2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units

11

3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)

4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage

The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition

The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs

312 Maintenance in Power System

The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined

Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])

Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more

12

attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)

The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems

32 Costs

Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows

bull Manpower cost Cost for the maintenance team that performs maintenanceactions

bull Spare part cost The cost of a new component is an important part of themaintenance cost

bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine

bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency

bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure

bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)

33 Main Constraints

Possibles constraints for the maintenance of power system have been identified asfollows

13

bull Manpower The size and availability of the maintenance staff is limited

bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available

bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms

bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time

bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model

bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model

bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model

14

Chapter 4

Introduction to Dynamic

Programming

This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels

41 Introduction

Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system

The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions

In the following important ideas concerning Dynamic Programming are discussed

411 Principle of Optimality

Dynamic programming is a way of decomposing a large problem into subproblems

It can be applied to any problem that observes the principle of optimality

15

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 10: Models

Chapter 1

Introduction

11 Background

The market and competition laws are introduced among power system companiesdue to the restructuration and deregulation of modern power system The gen-erating companies as well as transmission and distribution system operators aimto minimize their costs Maintenance costs can be a significant part of the totalcosts The pressure to reduce the maintenance budget leads to a need for efficientmaintenance

Maintenance cost be divided into Corrective Maintenance (CM) and PreventiveMaintenance (PM) (see Chapter 21)

CM means that an asset is maintained once an unscheduled functionnal failureoccurs CM can imply high costs for unsupplied energy interruption possible de-terioration of the system human risks or environment consequences etc

PM is employed to reduce the risk of unexpected failure Time Based Maintenance(TBM) is used for the most critical components and Condition Based Maintenance(CBM) for the components that are worth and not too expensive to monitoreThese maintenance actions have a cost for unsupplied energy inspection repairreplacement etc

An efficient maintenance should balance the corrective and preventive maintenanceto minimize the total costs of maintenance

The probability of a functionnal failure for a component is stochastic The probabil-ity depends on the state of component resulting from the history of the component(age intensity of use external stress (such as weather) maintenance actions human

1

errors and construction errors) Stochastic Dynamic Programming (SDP) modelsare optimization models that integrate explicitely stochastic behaviors This featuremakes the models interesting and was the starting idea of this work

12 Objective

The main objective of this work is to investigate the use of stochastic dynamicprogramming models for maintenance optimization and identify possible future ap-plications in power systems

13 Approach

The first task was to understand the different dynamic programming approachesA first distinction was made between finite horizon and infinite horizon approaches

The different techniques that can be used for solving a model based on dynamicprogramming was investigated For infinite horizon models approximate dynamicprogramming was studied These types of methods are related to the field of rein-forcement learning

Some SDP models found in the literature was reviewed Conclusions was madeabout the applicability of each approach for maintenance optimization problemsMoreover future avenue for research was identified

A finite horizon replacement model was developed to illustrate the possible use ofSDP for power system maintenance

14 Outline

Chapter 2 solves an overview of the maintenance field The most important methodsand some optimization models are reviewed

Chapter 3 discusses shortly power systems Some costs and constraints for opti-mization models are proposed

Chapter 4-7 focus on different Dynamic Programming (DP) approaches and al-gorithms to solve them The assumption of the models and practical limitationsare discussed The basic of DP models is investigated in deterministic models inChapter 4 Chapter 5 and 6 focus on Stochastic Dynamic Programming methods

2

respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods

Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization

Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization

Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch

3

Chapter 2

Maintenance

The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22

21 Types of Maintenance

Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance

Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure

Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance

1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established

5

Maintenance

Preventive Maintenance

Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)

Continuous Schedulled Inspection Based

Corrective Maintenance

Figure 21 Maintenance Tree based on [1]

2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures

22 Maintenance Optimization Models

Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost

The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance

Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems

In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]

6

221 Age Replacement Policies

Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model

A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age

A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured

An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement

222 Block Replacement Policies

In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT

This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution

223 Condition Based Maintenance

CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components

One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement

7

of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables

For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection

An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)

224 Opportunistic Maintenance Models

Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved

Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components

A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models

225 Other Types of Models and Criteria of Classifications

Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system

Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model

8

is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used

The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions

9

Chapter 3

Introduction to the Power

System

This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed

31 Power System Presentation

Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents

311 Power System Description

A simple description of the power system include the following main parts

1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated

2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units

11

3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)

4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage

The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition

The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs

312 Maintenance in Power System

The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined

Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])

Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more

12

attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)

The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems

32 Costs

Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows

bull Manpower cost Cost for the maintenance team that performs maintenanceactions

bull Spare part cost The cost of a new component is an important part of themaintenance cost

bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine

bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency

bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure

bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)

33 Main Constraints

Possibles constraints for the maintenance of power system have been identified asfollows

13

bull Manpower The size and availability of the maintenance staff is limited

bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available

bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms

bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time

bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model

bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model

bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model

14

Chapter 4

Introduction to Dynamic

Programming

This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels

41 Introduction

Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system

The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions

In the following important ideas concerning Dynamic Programming are discussed

411 Principle of Optimality

Dynamic programming is a way of decomposing a large problem into subproblems

It can be applied to any problem that observes the principle of optimality

15

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 11: Models

errors and construction errors) Stochastic Dynamic Programming (SDP) modelsare optimization models that integrate explicitely stochastic behaviors This featuremakes the models interesting and was the starting idea of this work

12 Objective

The main objective of this work is to investigate the use of stochastic dynamicprogramming models for maintenance optimization and identify possible future ap-plications in power systems

13 Approach

The first task was to understand the different dynamic programming approachesA first distinction was made between finite horizon and infinite horizon approaches

The different techniques that can be used for solving a model based on dynamicprogramming was investigated For infinite horizon models approximate dynamicprogramming was studied These types of methods are related to the field of rein-forcement learning

Some SDP models found in the literature was reviewed Conclusions was madeabout the applicability of each approach for maintenance optimization problemsMoreover future avenue for research was identified

A finite horizon replacement model was developed to illustrate the possible use ofSDP for power system maintenance

14 Outline

Chapter 2 solves an overview of the maintenance field The most important methodsand some optimization models are reviewed

Chapter 3 discusses shortly power systems Some costs and constraints for opti-mization models are proposed

Chapter 4-7 focus on different Dynamic Programming (DP) approaches and al-gorithms to solve them The assumption of the models and practical limitationsare discussed The basic of DP models is investigated in deterministic models inChapter 4 Chapter 5 and 6 focus on Stochastic Dynamic Programming methods

2

respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods

Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization

Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization

Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch

3

Chapter 2

Maintenance

The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22

21 Types of Maintenance

Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance

Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure

Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance

1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established

5

Maintenance

Preventive Maintenance

Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)

Continuous Schedulled Inspection Based

Corrective Maintenance

Figure 21 Maintenance Tree based on [1]

2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures

22 Maintenance Optimization Models

Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost

The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance

Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems

In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]

6

221 Age Replacement Policies

Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model

A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age

A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured

An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement

222 Block Replacement Policies

In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT

This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution

223 Condition Based Maintenance

CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components

One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement

7

of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables

For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection

An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)

224 Opportunistic Maintenance Models

Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved

Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components

A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models

225 Other Types of Models and Criteria of Classifications

Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system

Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model

8

is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used

The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions

9

Chapter 3

Introduction to the Power

System

This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed

31 Power System Presentation

Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents

311 Power System Description

A simple description of the power system include the following main parts

1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated

2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units

11

3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)

4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage

The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition

The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs

312 Maintenance in Power System

The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined

Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])

Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more

12

attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)

The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems

32 Costs

Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows

bull Manpower cost Cost for the maintenance team that performs maintenanceactions

bull Spare part cost The cost of a new component is an important part of themaintenance cost

bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine

bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency

bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure

bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)

33 Main Constraints

Possibles constraints for the maintenance of power system have been identified asfollows

13

bull Manpower The size and availability of the maintenance staff is limited

bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available

bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms

bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time

bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model

bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model

bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model

14

Chapter 4

Introduction to Dynamic

Programming

This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels

41 Introduction

Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system

The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions

In the following important ideas concerning Dynamic Programming are discussed

411 Principle of Optimality

Dynamic programming is a way of decomposing a large problem into subproblems

It can be applied to any problem that observes the principle of optimality

15

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 12: Models

respectively for finite and infinite horizons Chapter 7 is an introduction to Approx-imate Dynamic Programming (ADP) also known as Reinforcement Learning (RL)which is an approach to solving Dynamic Programming infinite horizon problemsusing approximate methods

Chapter 8 gives a review of some maintenance optimization models based on dy-namic programming Conclusions are made about possible use of the differentapproaches in maintenance optimization

Chapter 9 is an example of how finite horizon dynamic programming can be usedfor maintenance optimization

Chapter 10 summarizes the conlusions of the work and discuss possible avenues forresearch

3

Chapter 2

Maintenance

The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22

21 Types of Maintenance

Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance

Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure

Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance

1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established

5

Maintenance

Preventive Maintenance

Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)

Continuous Schedulled Inspection Based

Corrective Maintenance

Figure 21 Maintenance Tree based on [1]

2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures

22 Maintenance Optimization Models

Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost

The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance

Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems

In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]

6

221 Age Replacement Policies

Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model

A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age

A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured

An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement

222 Block Replacement Policies

In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT

This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution

223 Condition Based Maintenance

CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components

One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement

7

of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables

For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection

An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)

224 Opportunistic Maintenance Models

Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved

Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components

A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models

225 Other Types of Models and Criteria of Classifications

Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system

Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model

8

is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used

The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions

9

Chapter 3

Introduction to the Power

System

This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed

31 Power System Presentation

Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents

311 Power System Description

A simple description of the power system include the following main parts

1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated

2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units

11

3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)

4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage

The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition

The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs

312 Maintenance in Power System

The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined

Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])

Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more

12

attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)

The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems

32 Costs

Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows

bull Manpower cost Cost for the maintenance team that performs maintenanceactions

bull Spare part cost The cost of a new component is an important part of themaintenance cost

bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine

bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency

bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure

bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)

33 Main Constraints

Possibles constraints for the maintenance of power system have been identified asfollows

13

bull Manpower The size and availability of the maintenance staff is limited

bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available

bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms

bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time

bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model

bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model

bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model

14

Chapter 4

Introduction to Dynamic

Programming

This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels

41 Introduction

Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system

The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions

In the following important ideas concerning Dynamic Programming are discussed

411 Principle of Optimality

Dynamic programming is a way of decomposing a large problem into subproblems

It can be applied to any problem that observes the principle of optimality

15

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 13: Models

Chapter 2

Maintenance

The context of maintenance optimization is shortly described in this chapter Differ-ent types of maintenance are defined in Section 21 Some maintenance optimizationmodels are reviewed in Section 22

21 Types of Maintenance

Maintenance is a combination of all technical administrative and managerial actionsduring the life cycle of an item intended to retain it or restore it to a state in whichit can perform the required functions [1] Figure 21 shows a general picture of thedifferent types of maintenance

Corrective Maintenance (CM) is carried out after fault recognition and intendedto put an item into a state in which it can perform a required function [1] It istypically performed in case there is no way or it is not worth detecting or preventinga failure

Preventive maintenance aims at undertaking maintenance actions on a componentbefore it fails to eg avoid high cost of replacement power delivery unsuppliedand possible damages of the surrounding of the component One can distinguishbetween two kind of preventive maintenance

1 Time Based Maintenance (TBM) is preventive maintenance carried out inaccordance with established intervals of time or number of units of use butwithout previous condition investigation [1] TBM is used for failures that areage-related and for which the probability of failure on time can be established

5

Maintenance

Preventive Maintenance

Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)

Continuous Schedulled Inspection Based

Corrective Maintenance

Figure 21 Maintenance Tree based on [1]

2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures

22 Maintenance Optimization Models

Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost

The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance

Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems

In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]

6

221 Age Replacement Policies

Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model

A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age

A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured

An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement

222 Block Replacement Policies

In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT

This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution

223 Condition Based Maintenance

CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components

One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement

7

of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables

For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection

An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)

224 Opportunistic Maintenance Models

Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved

Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components

A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models

225 Other Types of Models and Criteria of Classifications

Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system

Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model

8

is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used

The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions

9

Chapter 3

Introduction to the Power

System

This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed

31 Power System Presentation

Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents

311 Power System Description

A simple description of the power system include the following main parts

1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated

2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units

11

3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)

4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage

The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition

The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs

312 Maintenance in Power System

The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined

Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])

Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more

12

attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)

The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems

32 Costs

Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows

bull Manpower cost Cost for the maintenance team that performs maintenanceactions

bull Spare part cost The cost of a new component is an important part of themaintenance cost

bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine

bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency

bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure

bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)

33 Main Constraints

Possibles constraints for the maintenance of power system have been identified asfollows

13

bull Manpower The size and availability of the maintenance staff is limited

bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available

bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms

bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time

bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model

bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model

bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model

14

Chapter 4

Introduction to Dynamic

Programming

This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels

41 Introduction

Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system

The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions

In the following important ideas concerning Dynamic Programming are discussed

411 Principle of Optimality

Dynamic programming is a way of decomposing a large problem into subproblems

It can be applied to any problem that observes the principle of optimality

15

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 14: Models

Maintenance

Preventive Maintenance

Time-Based Maintenance (TBM) Condition Based Maintenance (CBM)

Continuous Schedulled Inspection Based

Corrective Maintenance

Figure 21 Maintenance Tree based on [1]

2 Condition Based Maintenance is preventive maintenance based on perfor-mance andor parameter monitoring and the subsequent actions [1] PMcorresponds to all the maintenance methods using diagnostic or inspectionsto decide of the maintenance actions Diagnostic methods include the use ofhuman senses (noise visual etc) measurements or tests They can be un-dertaken continuously or during schedulled or requested inspections CBM isoften used for non-age related failures

22 Maintenance Optimization Models

Unexpected failures of a component in a system can lead to expensive CorrectiveMaintenance Preventive Maintenance approaches can be used to avoid CM Ifpreventive maintenance is done too frequently it can however also result in a veryhigh cost

The aim of the maintenance optimization could be to balance corrective and pre-ventive maintenance to minimize for example the total cost of maintenance

Numerous maintenance optimization models have been proposed in the litteratureand interesting reviews have been published Wang [43] gives an interesting pictureof maintenance policy optimization and its influence factors Cho et al [15]Dekker et al [16] and Nicolai et al [31] focus mainly on multi-componentproblems

In this section the most common classes of models are described and some referencesare given This short review is based on Chapter 8 of [4]

6

221 Age Replacement Policies

Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model

A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age

A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured

An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement

222 Block Replacement Policies

In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT

This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution

223 Condition Based Maintenance

CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components

One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement

7

of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables

For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection

An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)

224 Opportunistic Maintenance Models

Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved

Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components

A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models

225 Other Types of Models and Criteria of Classifications

Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system

Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model

8

is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used

The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions

9

Chapter 3

Introduction to the Power

System

This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed

31 Power System Presentation

Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents

311 Power System Description

A simple description of the power system include the following main parts

1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated

2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units

11

3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)

4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage

The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition

The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs

312 Maintenance in Power System

The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined

Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])

Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more

12

attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)

The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems

32 Costs

Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows

bull Manpower cost Cost for the maintenance team that performs maintenanceactions

bull Spare part cost The cost of a new component is an important part of themaintenance cost

bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine

bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency

bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure

bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)

33 Main Constraints

Possibles constraints for the maintenance of power system have been identified asfollows

13

bull Manpower The size and availability of the maintenance staff is limited

bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available

bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms

bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time

bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model

bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model

bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model

14

Chapter 4

Introduction to Dynamic

Programming

This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels

41 Introduction

Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system

The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions

In the following important ideas concerning Dynamic Programming are discussed

411 Principle of Optimality

Dynamic programming is a way of decomposing a large problem into subproblems

It can be applied to any problem that observes the principle of optimality

15

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 15: Models

221 Age Replacement Policies

Under an age replacement policy a component is replace at failure or at the end ofa specified interval whichever occurs first [17] This policy makes sens if preventivereplacement is less expensive than a corrective replacement and the failure rateincrease with time Barlow et al [7] describes a basic age replacement model

A model including discount have been proposed in [17] In this model the loss valueof a replaced component decreases with its age

A model with minimal repair is discussed in [6] If the component fails it can berepaired to the same condition as before the failure occured

An ageblock replacement model with failures resulting from shocks is described in[38] The shocks follows a non-homogeneous Poisson distribution (Poisson processwith a rate that is not stationnary) Two types of failures can result from the shocksminor failure removed by minor repair and major failure removed by replacement

222 Block Replacement Policies

In blocks replacement policies the components of a system are replaced at failureor at fixed times kT (k = 1 2 ) whichever occurs first Barlow et al [7] describesa basic block replacement model To avoid that a component that has just beenreplaced is replaced again a modified block replacement model is proposed in [10]A component is not replaced at a schedulled replacement time if its age is less thanT

This model has been modified in [11] to model that the operational cost of an unitis higher when it becomes older Moreover the model of [10] is extended in [5] toallow multi-component systems with any discrete lifetime distribution

223 Condition Based Maintenance

CBM is being introduced in many systems to avoid unnecessary maintenance andprevent incipient failure In wind turbines condition monitoring is being intro-duced for components like the gear box blades etc [32] One problem prior to theoptimization is to identify relevant variables and identify their relation with failuresmodes and probabilities CBM optimization models focus on different questionsrelated to inspectedmonitored components

One question is the optimal limits for the monitored variables above which it is nec-essary to perform maintenance The optimal wear-limit for preventive replacement

7

of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables

For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection

An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)

224 Opportunistic Maintenance Models

Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved

Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components

A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models

225 Other Types of Models and Criteria of Classifications

Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system

Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model

8

is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used

The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions

9

Chapter 3

Introduction to the Power

System

This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed

31 Power System Presentation

Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents

311 Power System Description

A simple description of the power system include the following main parts

1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated

2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units

11

3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)

4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage

The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition

The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs

312 Maintenance in Power System

The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined

Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])

Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more

12

attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)

The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems

32 Costs

Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows

bull Manpower cost Cost for the maintenance team that performs maintenanceactions

bull Spare part cost The cost of a new component is an important part of themaintenance cost

bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine

bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency

bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure

bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)

33 Main Constraints

Possibles constraints for the maintenance of power system have been identified asfollows

13

bull Manpower The size and availability of the maintenance staff is limited

bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available

bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms

bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time

bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model

bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model

bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model

14

Chapter 4

Introduction to Dynamic

Programming

This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels

41 Introduction

Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system

The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions

In the following important ideas concerning Dynamic Programming are discussed

411 Principle of Optimality

Dynamic programming is a way of decomposing a large problem into subproblems

It can be applied to any problem that observes the principle of optimality

15

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 16: Models

of a component is derived in [34] The model is extended in [35] to include differentmonitoring variables

For components subject to inspection at each decision epoch one must decide ifmaintenance should be performed and when the next inspection should occur In[2] the inspection occur at fixed time and the decision of preventive replacementof the component depend on its condition at inspection In [9] a Semi-MarkovDecision Process (SMDP see Chapter 4) is proposed to optimize at each inspectionthe maintenance decision and the time to next inspection

An age replacement policies model that takes into account the information fromcondition based monitoring devices is proposed in [25] A proportional hazardmodel is used to model the effect of the monitored variables The assumption ofa hazard model is that the hazard function is the product of a two functions onedepending on the time and one on the parameters (monitored variables)

224 Opportunistic Maintenance Models

Opportunistics maintenance considers unexpected opportunities of performing pre-ventive maintenance With the failure of a component it is possible to perform PMon other components This could be interesting for offshore wind farms for exampleThe deplacement to the wind farm by boat or helicopter is necessary and can bevery expensive By grouping maintenance actions money could be saved

Haurie et al [19] focus on group preventive replacement policy of m identicalcomponents that are in the same condition Both discrete and continuous time areconsidered and a dynamic programming equation is derived The model is extendedin [26] for m non-identical components

A rolling horizon dynamic programming algorithm is proposed in [45] to take intoaccount the short term information The model can be used for many maintenanceoptimization models

225 Other Types of Models and Criteria of Classifications

Other models integrate the possibility of a limited number of spare parts or a possi-ble choice between different spare part Eg cannibalization models allows the re-useof some components or subcomponents of a system

Other criterias can be used to classify maintenance optimization models The num-ber of components in consideration is important eg multi-components modelsare more interesting in power system The time horizon considered in the model

8

is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used

The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions

9

Chapter 3

Introduction to the Power

System

This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed

31 Power System Presentation

Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents

311 Power System Description

A simple description of the power system include the following main parts

1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated

2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units

11

3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)

4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage

The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition

The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs

312 Maintenance in Power System

The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined

Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])

Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more

12

attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)

The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems

32 Costs

Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows

bull Manpower cost Cost for the maintenance team that performs maintenanceactions

bull Spare part cost The cost of a new component is an important part of themaintenance cost

bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine

bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency

bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure

bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)

33 Main Constraints

Possibles constraints for the maintenance of power system have been identified asfollows

13

bull Manpower The size and availability of the maintenance staff is limited

bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available

bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms

bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time

bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model

bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model

bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model

14

Chapter 4

Introduction to Dynamic

Programming

This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels

41 Introduction

Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system

The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions

In the following important ideas concerning Dynamic Programming are discussed

411 Principle of Optimality

Dynamic programming is a way of decomposing a large problem into subproblems

It can be applied to any problem that observes the principle of optimality

15

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 17: Models

is important Many articles consider infinite time horizon More focus should bedone on finite horizon since they are more practical Another characteristic of themodel is the time representation if discrete or continuous time is considered Onedistinction can be done between models with deterministic and stochastic lifetime ofcomponents Among stochastic approaches it can be interesting to consider whichkind of lifetime distribution can be used

The method used for solving the problem has an influence on the solution A modelthat can not be solved is of no interest For some model exact solution are possibleFor complex models it is either necessary to simplify the model or to use heuristicmethods to find approximate solutions

9

Chapter 3

Introduction to the Power

System

This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed

31 Power System Presentation

Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents

311 Power System Description

A simple description of the power system include the following main parts

1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated

2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units

11

3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)

4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage

The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition

The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs

312 Maintenance in Power System

The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined

Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])

Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more

12

attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)

The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems

32 Costs

Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows

bull Manpower cost Cost for the maintenance team that performs maintenanceactions

bull Spare part cost The cost of a new component is an important part of themaintenance cost

bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine

bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency

bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure

bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)

33 Main Constraints

Possibles constraints for the maintenance of power system have been identified asfollows

13

bull Manpower The size and availability of the maintenance staff is limited

bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available

bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms

bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time

bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model

bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model

bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model

14

Chapter 4

Introduction to Dynamic

Programming

This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels

41 Introduction

Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system

The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions

In the following important ideas concerning Dynamic Programming are discussed

411 Principle of Optimality

Dynamic programming is a way of decomposing a large problem into subproblems

It can be applied to any problem that observes the principle of optimality

15

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 18: Models

Chapter 3

Introduction to the Power

System

This chapter gives a brief description of electrical power systems Some costs andconstraints for a maintenance model are proposed

31 Power System Presentation

Power systems are very complex They are composed of thousands of componentslinked through a complex mesh of lines and cables that have limited capacities Withthe deregulation of power systems the generation distribution and transmissionsystems are separated Even considered independently each part of the powersystem is complex with many components and subcomponents

311 Power System Description

A simple description of the power system include the following main parts

1 Generation That are the generation units that produce the power It canbe eg hydro-power units nuclear power plants wind farms etc The totalpower consumed is always equal to the power generated

2 Transmission The transmission system is composed of high voltage and highpower lines This part of the system is in general meshed The transmissionsystem connects distribution systems with generation units

11

3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)

4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage

The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition

The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs

312 Maintenance in Power System

The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined

Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])

Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more

12

attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)

The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems

32 Costs

Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows

bull Manpower cost Cost for the maintenance team that performs maintenanceactions

bull Spare part cost The cost of a new component is an important part of themaintenance cost

bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine

bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency

bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure

bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)

33 Main Constraints

Possibles constraints for the maintenance of power system have been identified asfollows

13

bull Manpower The size and availability of the maintenance staff is limited

bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available

bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms

bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time

bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model

bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model

bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model

14

Chapter 4

Introduction to Dynamic

Programming

This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels

41 Introduction

Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system

The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions

In the following important ideas concerning Dynamic Programming are discussed

411 Principle of Optimality

Dynamic programming is a way of decomposing a large problem into subproblems

It can be applied to any problem that observes the principle of optimality

15

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 19: Models

3 Distribution The distibution system is a voltage level below transmissionwhich is connected to customers It connects distribution system with con-sumers Distribution system are in general operated radial (One connectionpoint to the transmission system)

4 Consumption The consumer can be divided into different categories Con-sumer can be industry commercial house office agriculture etc The costs forinterruption are in general different for the different categories of consumerThese costs will also depend on the time of outage

The trade of electricity between producers and consumers is made through differentspecific markets in the world The rules and organization are different for eachmarket place The bids of electricity trades are declared in advance to the systemoperator This is necessary to check that the power system can withstand theoperationnal condition

The power system is controlled in real-time both automatically (automatic controland protection devices) and manually (with the help of the system operator tocoordinate the necessary action to avoid dangerous situations) Each component ofthe system influence the other If a component has a functional failure it can inducefailures of others component Cascading failures can have drastic consequences suchas black-outs

312 Maintenance in Power System

The objective is to find the right way to do maintenance Corrective Maintenanceand Preventive Maintenance should be balanced for each component of a systemand the optimal PM approaches should be determined

Reliability Centered Maintenance (RCM) is being introduced in power companies(See [47] for an example in hydropower) RCM is an structured approach to finda balance between corrective and preventive maintenance Research on ReliabilityCentered Asset Maintenance (RCAM) a quantitative approach to RCM is beingcarried out in the RCAM group at KTH School of electrical engineering Bertlinget al [12] defined in details the approach and its different steps An importantstep is the maintenance optimization In Hilber et al [20] a method based ona monetary importance index is proposed to define the importance of individualcomponents in a network Ongoing research focus for example on wind power (See[39] [32])

Research about power generation is typically focusing on predictive maintenanceusing condition based monitoring systems (See for example [18] or [44]) The prob-lem of maintenance for transmission and distribution systems has received more

12

attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)

The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems

32 Costs

Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows

bull Manpower cost Cost for the maintenance team that performs maintenanceactions

bull Spare part cost The cost of a new component is an important part of themaintenance cost

bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine

bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency

bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure

bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)

33 Main Constraints

Possibles constraints for the maintenance of power system have been identified asfollows

13

bull Manpower The size and availability of the maintenance staff is limited

bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available

bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms

bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time

bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model

bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model

bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model

14

Chapter 4

Introduction to Dynamic

Programming

This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels

41 Introduction

Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system

The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions

In the following important ideas concerning Dynamic Programming are discussed

411 Principle of Optimality

Dynamic programming is a way of decomposing a large problem into subproblems

It can be applied to any problem that observes the principle of optimality

15

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 20: Models

attention since the deregulation of the electricity market (See for example [12][27] for distribution systems [22] [30] for transmission systems)

The emergence of new condition based monitoring systems is changing the approachto maintenance in power system There is a need for new models and methods tooptimize the use of condition based monitoring systems

32 Costs

Possible costsincomes related to maintenance in power systems have been identified(non-inclusively) as follows

bull Manpower cost Cost for the maintenance team that performs maintenanceactions

bull Spare part cost The cost of a new component is an important part of themaintenance cost

bull Maintenance equipment cost If special equipment is needed for undertakingthe maintenance An helicopter can sometime be necessary for the mainte-nance of some parts of an off-shore wind turbine

bull Energy production The electricity produce is sold to consumers on the elec-tricity market The price of electricity can fluctuate At the same time thepower produce by a generating power unit can fluctuate depending on factorslike the weather (for renewable energy) The condition of the unit can alsoinfluence its efficiency

bull Unserved energyInterruption cost If there is an agreement to producedeliverenergy to a consumer at some specific time unserved energy must be paidThe cost depends on the contract and the cost per unit time depends on theduration of the failure

bull InspectionMonitoring cost Inspection or monitoring systems have a costthat must be considered The cost can be an initial investment (for continuousmonitoring systems) or discret costs (each time an inspection measurementor test is done on an asset)

33 Main Constraints

Possibles constraints for the maintenance of power system have been identified asfollows

13

bull Manpower The size and availability of the maintenance staff is limited

bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available

bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms

bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time

bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model

bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model

bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model

14

Chapter 4

Introduction to Dynamic

Programming

This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels

41 Introduction

Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system

The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions

In the following important ideas concerning Dynamic Programming are discussed

411 Principle of Optimality

Dynamic programming is a way of decomposing a large problem into subproblems

It can be applied to any problem that observes the principle of optimality

15

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 21: Models

bull Manpower The size and availability of the maintenance staff is limited

bull Maintenance Equipment The equipment needed for undertaking the mainte-nance must be available

bull Weather The weather can make certain maintenance actions postponed egin very windy conditions it is not possible to realize maintenance on offshorewind farms

bull Availability of the Spare Part If the needed spare parts are not availablemaintenance can not be done It can also happen that a spare part is availablebut far away from the location where it is needed The transportation has aprice and time

bull Maintenance Contracts Power companies can subscribe for maintenance ser-vices from the manufacturer of a system This is a typical option for windturbines [33] The time span of a contract can be a constraint for an opti-mization model

bull Availability of Condition Monitoring Information If condition monitoring sys-tems are installed on a system the information gathered by the monitoringdevices are not always available to non-manufacturer companies The avail-ability of monitoring information has an important impact is on the possibleinput for an optimization model

bull Statistical Data Available monitoring information have a value only if con-clusions about the deterioration or failure state in a system can be drawn fromthem Statistical data are necessary to create a probabilistic model

14

Chapter 4

Introduction to Dynamic

Programming

This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels

41 Introduction

Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system

The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions

In the following important ideas concerning Dynamic Programming are discussed

411 Principle of Optimality

Dynamic programming is a way of decomposing a large problem into subproblems

It can be applied to any problem that observes the principle of optimality

15

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 22: Models

Chapter 4

Introduction to Dynamic

Programming

This chapter deals with general ideas about Dynamic Programming (DP) and somefeature of possible DP models Deterministic DP is used to introduce the basic ofDP formulation and the value iteration method a classical method for solving DPmodels

41 Introduction

Dynamic Programming deals with multi-stage or sequential decisions problems Ateach decision epoch the decision maker (also called agent or controller in differentcontexts) observes the state of a system (It is assumed in this thesis that the systemis perfectly observable) An action is decided based on this state This action willresult in an immediate cost (or reward) and influence the evolution of the system

The aim of DP is to minimize (or maximize) the cumulative cost (respectivelyincome) resulting of a sequence of decisions

In the following important ideas concerning Dynamic Programming are discussed

411 Principle of Optimality

Dynamic programming is a way of decomposing a large problem into subproblems

It can be applied to any problem that observes the principle of optimality

15

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 23: Models

An optimal policy has the property that whatever the initial state andoptimal first decision may be the remaining decisions constitute an op-timal policy with regard to the state resulting from the first decision[8]

The solution of the subproblems are themselves solution of the general problemThe principle implies that at each stage the decision are based only on the currentstate of the system The previous decisions should not have influence on the actualevolution of the system and possible actions

Basically in maintenance problems it would mean that maintenance actions haveonly an effect on the state of the system directly after their accomplishment Theydo not influence the deterioration process after they have been completed

412 Deterministic and Stochastic Models

A system is said to be deterministic if the state at the next epoch depends only onthe actual state and action made

If a system is subject to probabilistic events it will evolve according to a proba-bilistic distribution depending on the actual state and action choice The system isthen refered to as probabilistic or stochastic

Functional failures are in general represented as stochastic events In consequencestochastic maintenance optimization models are interesting

413 Time Horizon

The time horizon of a model is the time window considered for the optimizationOne distinguishs between finite and infinite time horizons

Chapter 4 focus on finite horizon stochastic dynamic programming In the contextof maintenance the objective would be for example to minimize the maintenancecosts during the time horizon considered

Chapter 5 and 6 focus on models that assume an infinite time horizon This as-sumption implies that a system is stationary that it evolves in the same manner allthe time Moreover an infinite horizon optimization assumes implicitely that thesystem is used for a infinite time It can be an good approximation if indeed thelifetime of a system is very long

16

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 24: Models

414 Decision Time

In this thesis we focus mainly on Stochastic Dynamic Programming (SDP) withdiscrete sets of decision epochs (Chapter 3 4 and 6) Decisions are made at eachdecision epoch The time is devided into stages or periods between these epochs Itis clear that the interval time between 2 stages will have an influence on the result

Short intervals are more realistitic and precise but the models can become heavyif the time horizon is large In practice long intervals can be used for long-termplanning while short-term planning consider shorter intervals

Continum set of decision epochs implies that the decision can be made either contin-uously at some points decided by the decision maker or when an event occur Thetwo last possibilities will be shortly investigated in Chapter 5 Continuous decisionrefers to optimal control theory and will not be discussed here

415 Exact and Approximation Methods

Dynamic Programming suffers of a complexity problem the curse of dimensionality(discussed in Section 42)

Methods for solving the dynamic programming models exactly exist and are pre-sented in Chapters 5 and 6 However large models are untractable with thesemethods

Chapter 6 provide an introduction to the field of Reinforcement Learning (RL) thatfocus on approximations for DP solutions Approximate algorithms are obtainedby combining DP and supervised learning algorithms RL is also known as neuro-dynamic programming when DP is combined with neural networks [13]

17

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 25: Models

42 Deterministic Dynamic Programming

This section introduces the basics of deterministic Dynamic Programming Theoptimality equation is presented with the value iteration algorithm to solve it Thesection is illustrated with a classical example of a simple shortest path problem

421 Problem Formulation

The three main parts of a DP model are its state and decision spaces dynamic andcost functions and objective function The finite horizon model considers a systemthat evolves for N stages

State and Decision SpacesAt each stage k the system is in a state Xk = i that belongs to a state space ΩXk Depending on the state of the system the decision maker decide of an action to dou = Uk isin ΩUk (i)

Dynamic and Cost FunctionsAs a result of this action the system state at next stage will be Xk+1 = fk(i u)Moreover the action has a cost that the decision maker has to pay Ck(i u) A pos-sible terminal cost is associated to the terminal state (state at stage N) (CN (XN )

Objective FunctionThe objective is to determine the sequence of decision that will mimimize the cu-mulative cost (also called cost-to-go function) subject to the dynamic of the system

Jlowast0 (X0) = minUk

Nminus1sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 N minus 1

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kCk(i u) Cost functionCN (i) Terminal cost for state ifk(i u) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

18

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 26: Models

422 The Optimality Equation and Value Iteration Algorithm

The optimality equation (also known as Bellmanacutes equation) derives directly fromthe principle of optimality It states that the optimal cost-to-go function startingfrom stage k can be derived with the following formula

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) (41)

Jlowastk (i) Optimal cost-to-go from stage k to N starting from state i

The value iteration algorithm is a direct consequence of the optimality equation

JlowastN (i) = CN (i) foralli isin XN

Jlowastk (i) = minuisinΩU

k(i)Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

Ulowastk (i) = argminuisinΩU

k(i)

Ck(i u) + Jlowastk+1(fk(i u)) foralli isin Xk

u Decision variableUlowastk (i) Optimal decision action at stage k for state i

lll

The algorithm goes backwards starting from the last stage It stops when k=0

19

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 27: Models

423 A Simple Shortest Path Problem Example

Deterministic dynamic programming can be used to solve simple shortest path prob-lems with small state space

An example is used to illustrated the formulation and the value iteration algorithmThe following shortest path problem is considered

B E H

A C F I K

D G J

Stage 0 Stage 1 Stage 2 Stage 3 Stage 4

2

4

3

4

62

1

35

2

2

57

3

21

2

4

2

7

The aim of the problem is to determine the shortest way to reach the node Kstarting from the node A A cost (corresponding to a distance) is associated to eacharc One first way to solve the problem would be to calculate the cost of all thepossible path For example the path A-B-F-J-K has a cost of 2+6+2+7=17 Thenthe shortest path would be the one with the lowest cost

Dynamic programming provides a more efficient way to solve the problem Insteadof calculating all the path cost the problem will be divided in subproblems thatwill be solved recursively to determine the shortest path from each possible node tothe terminal node K

4231 Problem Formulation

The problem is divided into five stagesn=5 k=01234

State SpaceThe state space is defined for each stage

ΩX0 = A = 0ΩX1 = BCD = 0 1 2 ΩX2 = EFG = 0 1 2

ΩX3 = H I J = 0 1 2ΩX4 = K = 0

20

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 28: Models

Each node of the problem is defined by a stateXk For example X2 = 1 correspondsto the node F In this problem the state space is defined by one variable It is alsopossible to have multi-variable space for which Xk would be a vector

Decision SpaceThe set of decisions possible must be defined for each state at each stage In theexample the choice is which way should I take from this node to go to the nextstage The following notations are used

ΩUk (i) =

0 1 for i = 00 1 2 for i = 11 2 for i = 2

for k=123

ΩU0 (0) = 0 1 2 for k=0

For example ΩU1 (0) = ΩU (B) = 0 1 with U1(0) = 0 for the transition B rArr E orU1(0) = 1 for the transition B rArr F

Another example ΩU1 (2) = ΩU (D) = 1 2 with u1(2) = 2 for the transitionD rArr For u1(2) = 2 for the transition D rArr G

A sequence π = micro0 micro1 microN where microk(i) is a function mapping the state i atstage k with an admissible control for this state is called a policy The value itera-tion algorithm determine the optimal policy of the problem πlowast = microlowast0 micro

lowast1 micro

lowastN

Dynamic and Cost FunctionsThe dynamic function of the example is simple thanks to the notations usedfk(i u) = u

The transition costs are defined equal to the distance from one state to the resultingstate of the decision For example C1(0 0) = C(B rArr E) = 4 The cost function isdefined in the same way for the others stages and states

Objective Function

Jlowast0 (0) = minUkisinΩU

k(Xk)

4sumk=0Ck(Xk Uk) + CN (XN )

Subject to Xk+1 = fk(Xk Uk) k = 0 1 N minus 1

4232 Solution

The value iteration algorithm is used to solve the problem

The algorithm is initiated from the last stage and then iterated backwards until

21

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 29: Models

the initial state is reached The optimal decision sequence is then obtained forwardby using the optimal solution determined by the DP algorithm for the sequence ofstates that will be visited

The solution of the algorithm are given in Appendix A

The optimal cost-to-go is Jlowast0 (0) = 8 It corresponds to the following path ArArr D rArrG rArr I rArr K The optimal policy of the problem is πlowast = micro0 micro1 micro2 micro3 micro4 withmicrok(i) = ulowastk(i) (for example micro1(1) = 2 micro1(2) = 2)

22

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 30: Models

Chapter 5

Finite Horizon Models

In this chapter a stochastic version of the dynamic programming model in Chapter3 is presented The section introduces the theory for the proposed model in Chapter9 For more details and examples the book Markov Decision Processes DiscreteStochastic Dynamic Programming [36] is recommended

51 Problem Formulation

Stochastic dynamic programming can be used to model systems whose dynamic isprobabilistic (or subject to disturbances) The state of the system at the next stageis not deterministic as in Chapter 5 It depends on the current state and decision butalso on a stochastic variable that describes the disturbance the stochastic behaviorof the system

A stochastic dynamic programming model can be formulated as below

State Space

A variable k isin 0 N represents the different stages of the problem In generalit corresponds to a time variable

The state of the system is characterized by a variable i = Xk The possible statesare represented by a set of admissible states that can depends on k Xk isin ΩXk

Decision Space

At each decision epoch the decision maker must choose an action u = Uk amonga set of admissible actions This set can depend on the state of the system and on

23

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 31: Models

the stage u isin ΩUk (i)

Dynamic of the System and Transition Probability

On the contrary with the deterministic case the state transition does not dependonly on the control used but also on a disturbance ω = ωk(i u)

Xk+1 = fk(Xk Uk ω) k = 0 1 N minus 1

The effect of the disturbance can be expressed with transition probabilities Thetransition probabilities define the probability that the state of the system at stagek+1 is j if the state and control are i and u at the stage k These probabilities candepend also on the stage

Pk(j u i) = P (Xk+1 = j | Xk = i Uk = u)

If the system is stationary (time-invariant) the dynamic function f does not dependson time and the notation for the probability function can be simplified

P (j u i) = P (Xk+1 = j | Xk = i Uk = u)

In this case one refers to a Markov decision process If a control u is fixed for eachpossible state of the model then the probability transition can be represented by aMarkov model (See Chapter 9 for an example)

Cost Function

A cost is associated to each possible transition (ij) and action u The costs can alsodepend on the stage

Ck(j u i) = Ck(xk+1 = j uk = u xk = i)

If the transition (ij) occurs at stage k when the decision is u then a cost Ck(j u i) isgiven If the cost function is stationary then the notation is simplified by C(i u j)

A terminal cost CN (i) can be used to penalize deviation from a desired terminalstate

Objective Function

The objective is to determine the sequence of decision that optimize the expectedcumulative cost (cost-to-go function) Jlowast(X0) where X0 is the initial state of thesystem

Jlowast(X0) = minUkisinΩU

k(Xk)ECN (XN ) +

Nminus1sumk=0Ck(Xk+1 Uk Xk)

Subject to Xk+1 = fk(Xk Uk ωk(Xk Uk)) k = 0 1 N minus 1

24

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 32: Models

N Number of stagesk Stagei State at the current stagej State at the next stageXk State at stage kUk Decision action at stage kωk(i u) Probabilistic function of the disturbanceCk(i u j) Cost functionCN (i) Terminal cost for state ifk(i u ω) Dynamic functionJlowast0 (i) Optimal cost-to-go starting from state i

52 Optimality Equation

The optimality equation for stochastic finite horizon DP is

Jlowastk (i) = minuisinΩU

k(i)ECk(i u) + Jlowastk+1(fk(i u ω)) (51)

This equation define a condition for a cost-to-go function of a state i in stage k tobe optimal The equation can be re-written using the probability transitions

Jlowastk (i) = minuisinΩU

k(i)

sum

jisinΩXk+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] (52)

ΩXk State space at stage kΩUk (i) Decision Space at stage k for state iPk(j u i) Transition probability function

53 Value Iteration Method

The Value Iteration (VI) algorithm for SDP problems is directly based on equation52 The algorithm starts from the last stage By backward-recursions it determinesat each stage the optimal decision for each state of the system

JlowastN (i) = CN (i) foralli isin ΩXN (Initialisation)

While k ge 0 doJlowastk (i) = min

uisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXk

Ulowastk (i) = argminuisinUk(i)

sumjisinΩX

k+1

Pk(i u j) middot [Ck(i u j) + Jlowastk+1(j)] foralli isin ΩXN

k larr k minus 1

25

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 33: Models

u Decision variable U lowastk (i) Optimal decision action at stage k for state i

The recursion finishes when the first stage is reached

54 The Curse of Dimensionality

Consider a finite horizon stochastic dynamic problem with

bull N stages

bull NX states variables the size of the set for each state variable is S

bull NU control variables the size of the set for each control variable is A

The time complexity of the algorithm is O(N middotS2middotNX middotANU ) The complexity of theproblem increases exponentionally with the size of the problem (number of state ordecision variables) This characteristic of SDP is called the curse of dimensionality

55 Ideas for a Maintenance Optimization Model

In this section possible state variables for a maintenance models based on SDP arediscussed

551 Age and Deterioration States

The failure probability of components is often modelled as a function of time Apossible state variable for the component is its age To be precise the age of thecomponent should be discretized according to the stage duration If the lifetimeof a component is very long it can lead to a very large state space The timehorizon can be considered to reduce the number of states If a state variable cannot reach certain states during the planned horizon these states can be neglectedIf a component subcomponent or part of a system can be inspected or monitoreddifferent levels of deterioration can be used as a state variable In practice bothage and deterioration state variables could be used complementary

Of course maintenance states should be considered in both cases It could be possibleto have different types of failure states as major failure and minor failures Minorfailures could be cleared by repair while for a major failure a component should bereplace

26

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 34: Models

552 Forecasts

Measurements or forecasts can sometime estimate the disturbance a system is orcan be subject to The reliability of the forecasts should be carefully consideredDeterministic information could be used to adapt the finite horizon model on theirhorizon of validity It would also be possible to generate different scenarios fromforcasts solve the problem for the different scenarios and get some conclusions fromthe different solutions Another way of using forecasting models is to include them inthe maintenance problem formulation by adding a specific variable It will reducethe uncertainties but in return increase the complexity The proposed model inChapter 9 gives an example of how to integrate a forecasting model in an electricityscenario

Another factor that could be interesting to forecast is the load Indeed the produc-tion must always be in balance with the generation Also if there is no consumptionsome generation units are stopped This time can be used for the maintenance ofthe power plant

Weather forecasting could also be interesting in some cases For example the powergenerated by wind farms depends on the wind strength and maintenance actionon offshore wind farms are possible only in case of good weather For these tworeasons wind forecasting could be interesting for optimizing maintenance actionsof offshore wind farms

553 Time Lags

An important assumption of a DP model is that the dynamic of the system onlydepends on the actual state of the system (and possibly on the time if the systemdynamic is not stationary)

This condition of loss of memory is very strong and unrealistic in some cases Itis sometimes possible (if the system dynamic depends on few precedent states) toovercome this assumption Variables are added in the DP model to keep in memorythe precedent states that can be visited The computational price is once again veryhigh

For example in the context of maintenance it would be interesting to know thedeterioration level of an asset at the precedent stage It would give informationsabout the dynamic of the deterioration process

27

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 35: Models

Chapter 6

Infinite Horizon Models -

Markov Decision Processes

Infinite horizon models are models of systems that are considered stationary overtime The dynamic of the system as well as the cost function and the disturbancesare stationary Infinite horizon stochastic dynamic programming (IHSDP) modelscan be represented by a Markov Decision Process For more details and prooffor the convergence of the algorithm [36] or the introduction chpater of [13] arerecommended

In practice one scarcely faces problems with infinite number of stages It canhowever be a reasonable approximation of problems with very large number ofstates for which the value algorithm would lead to untractable computation

The approximation methods presented in Chapter 7 are based on the methodspresented in this chapter

61 Problem Formulation

The state space decision space probability function and cost function of IHSDPare defined in a similar way that FHSDP for the stationary case The aim of IHSDPis to minimize the cumulative costs of a system over an infinite number of stagesThis sum is called cost-to-go function

An interesting feature of IHSDP models is that the solution of the problem is astationary policy It means that the solution of the problem has the form π =micro micro micro micro is a function mapping the state space with the control space For

29

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 36: Models

i isin ΩX micro(i) is an admissible control for the state i micro(i) isin ΩU (i)

The objective is to find the optimal microlowast It should minimize the cost-to-go function

To be able to compare different policies it is necessary that the infinite sum ofcosts converge Different type of models can be considered stochastic shortest pathproblems discounted problems and average cost per stages problems

Stochastic shortest path modelsStochastic shortest path dynamic programming models have a terminal state (orcost-free terminaison state) that is not evitable When this state is reached thesystem remains in this state and no costs are paid

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk micro(Xk) ω(Xk micro(Xk))) k = 0 1 N minus 1

micro Decision policyJlowast(i) Optimal cost-to-go function for state i

Discounted problemsDiscounted IHSDP models have a cost function that is discounted by a factor α is adiscount factor (0 lt α lt 1) The cost function for discounted IHSDP has the formα middot Cij(u)

As Cij(u) is bounded the infinite sum will converge (decreasing geometric progres-sion)

Jlowast(X0) = minmicroE limNrarrinfin

Nminus1sumk=0α middot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

α Discount factor

Average cost per stage problemsInfinite horizon problems can sometimes not be represented with a no free-costterminaison state or discounted

To make the cost-to-go finite the problem can modelled as an average cost per stageproblem where the aim is to minimize

Jlowast = minmicroE limNrarrinfin

Nminus1sumk=0

1Nmiddot C(Xk+1 micro(Xk) Xk)

Subject to Xk+1 = f(Xk Uk ω(Xk micro(Xk))) k = 0 1 N minus 1

30

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 37: Models

62 Optimality Equations

The optimality equations are formulated using the probability function P (i u j)

The stationary policy microlowast solution of a IHSDP shortest path problem is solution ofthe Bellmanacutes equation (other name for the optimality equation - Bellman is themathematician at the origin of the DP theory)

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + Jmicro(j)] foralli isin ΩX

Jmicro(i) Cost-to-go function of policy micro starting from state iJlowast(i) Optimal cost-to-go function for state i

For a IHSDP discounted problem the optimality equation is

Jmicro(i) = minmicro(i)isinΩU (i)

sum

jisinΩX

Pij(u) middot [Cij(u) + α middot Jmicro(j)] foralli isin ΩX

The optimality equation for average cost-to-go IHSDP problems is discussed inSection 67

63 Value Iteration

To solve the optimality equations a first idea would be to use the value iterationalgorithm presented in the Chapter 5

Intuitively the algorithm should converge to the optimal policy It can be shownthat the algorithm will indeed converge to the optimal solution If the model isdiscounted then the method can be fast The time complexity is in polynomialtime of the size of the state space control space and 1

1minusα

For non-discounted models the theoretical number of iteration needed is infiniteand a relative criteria must be determine to stop the algorithm

An alternative to the method is the Policy Iteration (PI) algorithm This laterterminates after a finite number of iteration

64 The Policy Iteration Algorithm

Given a policy micro the first step of the algorithm evaluates the policy by calculatingthe expected cost-to-go function resulting from this policy The next step of the

31

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 38: Models

algorithm improve the expected cost-to-go function by enhancing the actual policyThis 2-steps algorithm is used iteratively The process stops when a policy is asolution of its own improvement

The algorithm starts with an initial policy micro0 Then it can be described by thefollowing steps

Step 1 Policy Evaluation

microq+1 = microq stop the algorithmElse Jmicroq(i) solution of the following linear system is calculated

Jmicroq(i) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

q Iteration number for the policy iteration algorithm

This is the expected cost-to-go function of the system using the policy microq

Step 2 Policy Improvement

A new policy is obtained using the value iteration algorithm

microq+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + Jmicroq(j)]

Go back to policy evaluation step

The process stops when microq+1 = microq

At each iteration the algorithm always improve the policy If the initial policy micro0

is already good then the algorithm will converge fast to the optimal solution

65 Modified Policy Iteration

If the number of states is large solving the linear problem of the policy evaluationcan be computational intensive

An alternative is to use at each stage the value iteration algorithm on a finitenumber of iterations M to estimate the value function of the policy The algorithm

is initialized with a value function JMmicrok (i) that must be chosen higher than the realvalue Jmicrok(i)

32

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 39: Models

While m ge 0 do

Jmmicrok(i) =sumjisinΩXP (j microk(i) i) middot [C(j microk(i) i) + Jm+1

microk (j)] foralli isin ΩX

mlarr mminus 1

m Number of iteration left for the evaluation step of modified policy iteration

The algorithm stops when m=0 and Jmicrok is approximated by J0microk

66 Average Cost-to-go Problems

The methods presented in Sections 51-54 can not be applied directly to average costproblems Average cost-to-go problems are more complicated and implies conditionson the Markov decision process for the convergence of the algorithms An averagecost-to-go problem can be reformulated as equivalent to a shortest path problemif the model of the Markov decision process is proved to be unichain (That is allstationary policies generate Markov chains that consist of a single ergodic class andpossibly some transient states See for details [36])

Given a stationary policy micro a state X isin ΩX there is an unique λmicro and vector hmicrosuch that

hmicro(X) = 0

λmicro + hmicro(i) =sum

jisinΩX

P (j micro(i) i) middot [C(j u i) + hmicro(j)] foralli isin ΩX

This λmicro is the average cost-to-go for the stationary policy micro The average cost-to-gois the same for all the starting state

The optimal average cost and optimal policy satisfy the Bellman equation

λlowast + hlowast(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j micro(i) i) middot [C(j micro(i) i) + hlowast] foralli isin ΩX

microlowast(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hlowast] foralli isin ΩX

661 Relative Value Iteration

The value iteration method can be adapted to average cost-to-go problems Themethod is called relative value iteration X is an arbitrary state and h0(i) is chosen

33

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 40: Models

arbitrarly

Hk = minuisinΩU (X)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(X)]

hk+1(i) = minuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] minusHk foralli isin ΩX

microk+1(i) = argminuisinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + hk(j)] foralli isin ΩX

The sequence hk will converge if the Markov decision process is unichain Moreoverthe algorithm converge to the optimal policy The number of iterations needed isinfinite in theory

662 Policy Iteration

The problem can also be solved using the policy iteration algorithm

Initialisation X can be chosen arbitrarly

Step 1 Evaluation of the policyIf λq+1 = λq and and hq+1(i) = hq(i) foralli isin ΩX stop the algorithm

Else solve the system of equation

hq(X) = 0λq + hq(i) =

sumjisinΩXP (j micro(q)(i) i) middot [C(j u i) + hq(j)] foralli isin ΩX

Step 2 Policy improvement

microq+1 = argminuisinΩU (i)

sumjisinΩXP (j u i) middot [C(j u i) + hq] foralli isin ΩX

q = q + 1

67 Linear Programming

The three types of IHSDP models can be reformulated to be solved with linearprogramming (LP) methods The motivation for this apporach is that a linearprogramming model can include constraints that are not possible to include in aclassical MDP model However the model become less intuitive than with the othermethods Moreover LP can only be used for smaller state spaces than the valueiteration and policy iteration methods

34

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 41: Models

For example in the discounted IHSDP

Jmicro(i) = argminmicro(i)isinΩU (i)

sum

jisinΩX

P (j u i) middot [C(j u i) + α middot Jmicro(j)] foralli isin ΩX

Jmicro(i) is solution of the following linear programming model

MinimizesumiisinΩXJmicro(i)

Subject to Jmicro(i) +sumjisinΩX α middot Jmicro(j) middot C(j u i) le

sumjisinΩX P (j u i) middot C(j u i)forallu i

At present linear programming has not proven to be an efficient method for solvinglarge discounted MDPs however innovations in LP algorithms in the past decademight change this [36]

68 Efficiency of the Algorithms

For details about the complexity of the algorithms [28] and [29] are recommended

If n and m denote the number of states and actions this means that a DP methodtakes a number of computational operations that is less than some polynomialfunction of n and m A DP method is guaranteed to find an optimal policy inpolynomial time even though the total number of (deterministic) policies ismn [41]But linear programming methods become impractical at a much smaller number ofstates than do DP methods [41]

Since the policy iteration algorithm always improve the policy at each iteration thealgorithm will converge quite fast if the initial policy micro0 is already good There isstrong empirical evidence in favor of PI over VI and LP in solving Markov decisionprocesses [28]

69 Semi-Markov Decision Process

Until now the decision epochs were predetermined at discrete time points (periodicin the case of infinite horizon problems) However for some applications the de-cision time can be random For example the next decision time can be decided bythe decision maker depending on the actual state of the system Or the decisionepoch occurs each time the state of the system is changing This kind of problemsrefers to Semi-Markov Decision Processes (SMDP)

SMDP generalize MDP by 1) allowing or requiring the decision maker to chooseactions whenever the system state changes 2) modeling the system evolution in

35

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 42: Models

continuous time and 3) allowing the time spent in a particular state to follow anarbitrary probability distibution [36]

The time horizon is considered infinite and the action are not made continuously(this kind of problems refer to optimal control theory)

SMDP are more complicated than MDP and will not be part of this thesis Put-erman [36] explains how one can transform a SMDP model into a model solvablewith the methods presented previously in this chapter

SMDP could be interesting in maintenance optimization since they allows a choiceof inspection interval for each state of the system However due to the complexityof the models only small state space are tractable

36

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 43: Models

Chapter 7

Approximate Methods for

Markov Decision Process -

Reinforcement Learning

Reinforcement Learning (RL) or Approximate Dynamic Programming (ADP) isan approach of machine learning that combines infinite horizon dynamic program-ming with supervised learning techniques Supervised learning techniques give thepossibility to approximate the cost-to-go function on a large state space

The aim of this chapter is to give an overview to RL For further interest see thebooks Handbook of Learning and Approximate Dynamic Programming [40] Neuro-Dynamic Programming [13] and article [23]

71 Introduction

The problem of the methods presented in the previous chapter is that the modelsare untractable for large state space In this chapter methods to overcome thisproblem by approximation are presented They make use of supervised learningtechniques

Supervised learning is a field that investigates the creation of functions from trainingdata (pairs input-output) to be able to predict future output for any kind of possibleinput data Many approachs are possible such as artificial neural networks decisiontree learning bayesian statistics

One of the first reinforcement learning approaches was using artificial neural net-

37

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 44: Models

works methods as supervised learning technique This approach was also calledneuro-dynamic programming (see [13])

Reinforcement learning methods refer to systems that learn how to make good de-cisions by observing their own behavior and use built-in mechanisms for improvingtheir actions trough a reinforcement mechanism [13]

The root of the algorithm proposed in RL are based on the methods of Chapter 6The system is assumed to be stationary and be a Markov decision process HoweverRL does not require that an explicite model of the system exist The methods caneven be applied in parallel of learning the environment (the MDP of the system)This can be a practical advantage since a fastidious model does not need to be builtfirst The state and decision space are assumed known The methods works onobserved trajectory samples that have the form (Xk Xk+1 Uk Ck)

The samples can be used to learn directly the cost-to-go function of a given policyor the Q-factor of a problem without estimating the probabilities transitions of themodel The first section deals with this type of learning Direct learning methodsThis approach is useful for large state space If a model of the system exist themethod can be used with samples from Monte Carlo simulations

In case of a real-time application it is possible to combine the learning of thetransition and cost functions with direct learning methods to take advantage of allthe experience obtained This approach is called Indirect learning (or model basedmethods) and will be discussed shortly

The RL methods are extension of the methods presented in Section 72 RL methodsmake use of supervised learning techniques to approximate the cost-to-go functionover the whole state space They are presented in Section 74

72 Direct Learning

The aim of reinforcement learning is to infer good decisions based on samples ofperformance of the system provided from simulation or real-life experience A sam-ple has the form (Xk Xk+1 Uk Ck) Xk+1 is the observed state after chosing thecontrol Uk in state Xk and Ck = C(Xk Xk+1 Uk) is the cost resulting from thistransition The samples can be generated by Monte Carlo simulation according tothe probabilities transitions P (j u i) and C(j u i) if a model of the system exists

38

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 45: Models

721 Policy Evaluation using Temporal Differences

Temporal differences (TD) is a method for estimating the cost-to-go function of apolicy micro using samples resulting from the use of this policy The method is usedin the first step of the policy method discussed in Chapter 6 It can be seen in asimilar way as the modified policy iteration

The cost-to-go function is estimated using the costs resulting of the simulationNote that from each state visited the remaining trajectory starting form this statecan be used as a sample for the cost-to-go function

TD will be presented in the context of Stochastic shortest path problems whichmeans that there is a terminal state and every simulation terminate over a finitetime The method can also be adapted to discounted problems or average-cost-to-goproblems

Policy evaluation by simulation Assume a trajectory (X0 XN ) has been gen-erated according to the policy micro and the sequence of transition cost C(Xk Xk+1) =C(Xk Xk+1 micro(Xk)) have been observed

The cost-to-go resulting from the trajectory starting from the state Xk is

V (Xk) =Nsum

n=k

C(Xn Xn+1)

V (Xk) Cost-to-go of a trajectory starting from state Xk

If a certain number of trajectories has been generated and the state i has beenvisited K times in these trajectoriesJ(i) can be estimated by

J(i) =1

K

Ksum

m=1

V (im)

V (im) Cost-to-go of a trajectory starting from state i after the mth visit

A recursive form of the method can be formulated

J(i) = J(i)+γ middot [V (im)minusJ(i)] with γ = 1m with m the number of the trajectory

From a trajectory point of view

J(Xk) = J(Xk) + γXk middot [V (Xk)minus J(Xk)]

γXk corresponding to 1m where m is the number of time Xk has already beenvisited by trajectories

39

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 46: Models

With the precedent algorithm it is necessary that V (Xk) is calculated from thewhole trajectory and then can be used when the trajectory is finished How-ever the method can be reformulated exploiting the relation V (Xk) = V (Xk+1) +C(Xn Xn+1)

At each transition of the trajectory the cost-to-go function of a state of the tra-jectory J(Xk) is updated Assuming that the lth transition is being generatedThen J(Xk) is updated for all the state that have been visited previously duringthe trajectory

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

TD(λ)A generalization of the precedent algorithm is the TD(λ) where a constant λ lt 1 isintroduced

J(Xk) = J(Xk) + γXk middot λkminusl middot [C(Xl Xl+1) + J(Xl+1)minus J(Xl)] forallk = 0 l

Note that TD(1) this is the same that the Policy evaluation by simulation Anotherspecial case is when λ = 0 The TD(0) algorithm is

J(Xk) = J(Xk) + γXk middot [C(Xl Xl+1) + J(Xk+1)minus J(Xk)]

Q-factorsOnce Jmicrok(i) has been estimated using the TD algorithm it is possible to make apolicy improvement evaluating the Q-factors defined by

Qmicrok(i u) =sumjisinX P (j u i) middot [C(j u i) + Jmicro(j)] Note that C(j u i) must be known

The improved policy

microk+1(i) = argminuisinΩU (i)

Qmicrok(i u)

It is in fact an approximate version of the policy iteration algorithm since Jmicro andQmicrok have been estimated using the samples

722 Q-learning

Q-learning is similar to a value iteration methods based on simulation The methodestimates directly the Q-factors without the need of the multiple policy evaluationof the TD method

The optimal Q-factor are defined by

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + Jlowast(j)] (71)

40

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 47: Models

The optimality equation can be rewritten in term of Q-factors

Jlowast(i) = minuisinU(Xk+1)

Qlowast(i u) (72)

By combining the 2 equations we obtain

Qlowast(i u) =sum

jisinΩX

P (j u i) middot [C(j u i) + minvisinU(j)

Qlowast(j v)] (73)

Qlowast(i u) is the unique solution of this equation The Q-learning algorithm is baseon (73)

Q(i u) can be initialized arbitrarly

For each sample (Xk Xk+1 Uk Ck) do

Uk = argminuisinU(Xk)

Q(Xk u))

Q(Xk Uk) = (1minus γ)Q(Xk Uk) + γ middot [C(Xk+1 Uk Xk) + minuisinU(Xk+1)

Q(Xk+1 u)]l

with γ defined as for TD

The trade-off explorationexploitation The convergence of the algorithms tothe optimal solution would imply that all the pair (xu) are tried infinitely oftenwhich is not realistic

In practice a trade-off must be made between phases of exploitation when a basepolicy (called also greedy policy) is evaluated (which is similar to the idea of TD(0))and phases of exploration during which new control are tried and a new greedy policyis determined

73 Indirect Learning

On-line application can take advantage of the experience gained from real time useby

-Using the direct learning approach presented in the precedent section for eachsample of experience

-Built on-line the model of the probabilities transitions and cost function and thenuse this model for off-line training of the system through simulation using directlearning

41

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 48: Models

74 Supervised Learning

With the methods presented in the precedent section the cost-to-go or Q-functionswas represented on a tabular form These approaches are suitable for moderate sizeproblems However for large state and control space this would be too computa-tionnal intensive To overcome this problem approximation methods can be usedto approximate the cost-to-go or Q-functions and the whole state and control space

As an example consider a cost-to-go function Jmicro(i) It will be replaced by a suitableapproximation J(i r) where r is a vector that has to be optimized based on thesamples available of Jmicro In the table representation precedently investigated Jmicro(i)was stored for all the value of i With an approximation structure only the vectorr is stored

Functions approximators must be able to well generalize over the state space theinformation gained from the samples In other words it should minimize the errorbetween the true function and the approximated one Jmicro(i)minus J(i r)

There are a lot of possibles methods for function approximators This field is relatedto supervised learning methods Possibles methods are artificial neural networkskernel-based methods or tree-based methods bayesian statistics for example

A general approach to a supervised learning problem can be

bull Determine an adequate structure for the approximated function and corre-sponding supervised learning method

bull Determine the input features of the function that is the important inputsthat characterize the state of the system The features are generally based onexperience or insight about the problem

bull Decide of a training algorithm

bull Gathering a training set

bull Train the function with the training set The function can then be validatedusing a subset of the training set

bull Evaluate the performance of the approximated function using a test set

An important difference between classical supervised learning and the one performedin reinforcement learning is that a real training set is not existing The trainingset are obtained either by simulation or from real-time samples This is already anapproximation of the real function

42

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 49: Models

Chapter 8

Review of Models for

Maintenance Optimization

This chapter reviews several SDP maintenance models found in the litterature Inconclusion the approachesmethods are compared and their applicability to main-tenance problem in power system is discussed

81 Finite Horizon Dynamic Programming

811 Deterministic Models

Dekker amp al [46] proposes a rolling horizon approach for short-term schedulingand grouping of maintenance activities Each individual maintenance activity isfirst based on an infinite horizon optimization The short-term planning use thesemaintenance activities as inputs Penalties are defined for deviations from theoriginal time of maintenance for each activity The whole maintenance activitiesare optimized using finite horizon dynamic programming

812 Stochastic Models

In [37] a SDP model is proposed to solve a finite horizon generating units mainte-nance scheduling The system considered is composed of n generating units Thepossible state for each unit is the number of remaining stages of maintenance andpossible failure of an unit not in maintenance during the stage The failure rates

43

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 50: Models

are assumed constant but different before and after maintenance Unserved energyand unserved reserve costs are considered for the cost function

One interesting feature of the model is that the time to achieve maintenance isconsidered stochastic Another is that the maintenance crew is assumed limited somaintenance can be done only on one generating unit at the time

The model is illustrated with a 3 unit example with 4 5 and 6 possible states forthe different units A 52 weeks horizon is considered with stages of one week length

82 Infinite Horizon Stochastic Models

821 Discrete Time infinite Horizon Models

In [14] an infinite horizon SDP model is considered for optimizing the maintenanceof a single component system The system can be in different deterioration statesmaintenance states or in a failure state Two kinds of failures are considered randomfailure and deterioration failure Each one modeled by a failure state with differenttime to repair

The time to deterioration failure is represented by an erlangian distribution Thepreventive maintenance is considered imperfect If the system fails the componentis replaced

An average cost-to-cost approach is used to evaluate the policy

First a Markov process of the system is investigated to determine the optimal meantime to preventive maintenance A Markov decision process model is built usingthe states probabilities and the optimal mean time to preventive maintenance cal-culated

The MDP is solved using the policy iteration algorithm The model is proved to beunichain before applying the algorithm An illustrative example is given It consid-ers 3 deterioration states one preventive maintenance state for each deteriorationstate and one failure state

Jayakumar et al [21] propose a similar MDP is proposed Major and minormaintenance are possible are possible For each possible maintenance action thedeterioration level after the maintenance is stochastic which is more realistic

The model is solved using the linear programming method

44

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 51: Models

822 Semi-Markov Decision Process

Many condition-based maintenance models based on SMDP have been proposedthese last years

Amari et al [3] present a general framework for solving condition-based mainte-nance problems by using SMDP The interest of the model is that for each possibledeterioration state possible maintenance decisions are minor maintenance majormaintenance (replacement) but also the choice for the next inspection time Anhypothetical example is given The model consists of 5 deterioration states and 1failure state 20 possible values for the inspection time are considered

The model of [14] is extended to a SMDP in [42] The inspection time is calculatedprior to the optimization using a semi-Markov process The SMDP model is said tosuperior because it includes the state sojourn time The model is illustrated withan example based on a 230kV air blast circuit beaker

83 Reinforcement Learning

Kalles et al [24] proposes the use of RL for preventive maintenance of power plantsThe article aims at giving reason of using RL for monitoring and maintenance ofpower plants The main advantages given are the automatic learning capabilitiesof RL The problem of time-lag (time between an action and its effect) is revealedPenalties are defined by deviations from normal operation of the system Theapproach proposed should first be used in parallel of the actual expert systems sothat the RL algorithm learns the environment then it could be applied in practiceOne important condition for a good learning of the environment is that the algorithmhas been trained in all situation and all the more in critical situation

84 Conclusions

An important assumption of all the models is the loss of memory (Markovian mod-els) The assumption is related to the principle of optimality It means that thetransition probability of the models can depend only on the actual state of thesystem independantly of its history

The finite horizon approach is adapted to short-term optimization From the lit-terature review this approach can be applied to maintenance scheduling I believethat the approach is interesting because it can integrate opportunistic maintenanceChapter 8 gives an example of this type of models A limitations is the consequence

45

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 52: Models

of the curse of dimensionality The complexity of the model increases exponention-naly with the number of states In consequence the number of components of afinite horizon SDP model can not be too high for being tractable

Several Markov Decision Process and Semi-Markov Decision Processes models havebeen proposed for solving condition based maintenance problems The models con-siders an average cost-to-go which is realistic SMDP have the advantages of beingable to optimize the time to next inspection depending on the states SMDP arealso more complex The models found in the litterature was considering only singlecomponents with only one state variable SMDP could be very useful for schedulledCBM and SMDP for inspection based CBM However for continuous time moni-toring it would be recommanded to use approximate methods

Approximate dynamic programming (reinforcement learning) have many advan-tages The methods does not need that a model of the system exist They learnfrom samples and could be used to adapt to a system Moreover they can handlelarge state space in comparison with MDP In my opinion reinforcement learningcould be used for continuous time monitoring of system with multi-states moni-toring The article [24] was also proposing this approach for condition monitoringof power plants However no implementation of the idea have been found in thelitterature A practical disadvantage of this approach is that the process of learningis time consuming It can (and should) be done off-line or based on a model thatalready exist but is too large to be solvable with classical methods A technicaldifficulty is the choice for an adequate supervised learning structure

Table 81 shows a summary of the models and most important methods

Table 81 Summary of models and methods

Characteristics Possible Application Method Advantagesin Maintenance DisadvantagesOptimization

Finite Horizon Model can be Short-term maintenance Value Iteration Limitated state spaceDynamic Programming Non-Stationary Optimization Scheduling (number of components)Markov Decision -Stationary Model Classical MethodsProcesses - Possible approaches for MDP

Average cost-to-go Continuous-time condition Value Iteration (VI) Can converge fast formonitoring maintenance high discount factoroptimization

Discounted Short-term maintenance Policy Iteration (PI) Faster in generaloptimization

Shortest path Linear Programming - Possible additionalconstraints- State space limited VI amp PI

Approximate Dynamic Can handle large state space Same as MDP for larger - TD-learning Can work withoutProgramming for MDP classical MDP methods systems - Q-learning an explicit modelSemi-Markov Decision -Can optimize Optimization for inspection Same as MDPProcesses interval inspection based maintenance

-Complex (Average cost-to-go approach)

46

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 53: Models

Chapter 9

A Proposed Finite Horizon

Replacement Model

A finite horizon SDP replacement model is proposed in this chapter The modelassumes a finite time horizon and discrete decision epochs The system in con-sideration is a power generating unit An interesting feature of the model is theintegration of the electricity price as a state variable Another is the possibility ofopportunistic maintenance ie if one component fails it is possible to do preventivemaintenance on another component that is still working

The proposed model is first presented for one component and is then generalizedto multi-components Both these models can be solved using the value iterationalgorithm

91 One-Component Model

911 Idea of the Model

In this chapter an age replacement model based on finite horizon dynamic pro-gramming is proposed The model is first described for one component for an easierunderstanding of its principle

The price of electricity was considered as an important factor that could influencethe maintenance decision Indeed if the electricity price is high it can be profitableto operate the system and wait for lower prices

If a high electricity price is expected in a close future it could be interesting to

47

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 54: Models

do maintenance immediately to be operational later and avoid maintenance in aprofitable period The idea was considered for the model The electricity price wasincluded as a state variable The variable consider different electricity scenario forexample high medium and low prices For each scenario the electricity price varywith a period of a year

There can be transitions from one scenario to another depending on the period ofthe year

In the scandinavian countries a large part of the electricity is based on hydro-power The electricity price is in consequence highly influenced by the weather Ifthe weather is warm and dry the hydro-storage will be low and the electricity pricefor the rest of the year may be high On the opposite a cold and rainy seasonmay result in low electricity price for the rest of the year This observation couldbe used to assume the electricity scenario to be transiant during the summer andstable during the rest of the year typically interpreted as dry year or wet year Thisassumption could be used as a base for modelling the transition for the electricitystate

912 Notations for the Proposed Model

Numbers

NE Number of electricity scenarioNW Number of working state for the componentNPM Number of preventive maintenance state for one componentNCM Number of corrective maintenance state for one component

Costs

CE(s k) Electricity cost at stage k for the electricity state sCI Cost per stage for interruptionCPM Cost per stage of Preventive maintenanceCCM Cost per stage of Corrective maintenanceCN (i) Terminal cost if the component is in state i

Variables

i1 Component state at the current stagei2 Electricity state at the current stagej1 Possible component state for the next stagej2 Possible electricity state for the next stage

State and Control Space

48

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 55: Models

x1k Component state at stage kx2k Electricity state at stage k

Probability function

λ(t) Failure rate of the component at age tλ(i) Failure rate of the component in state Wi

Sets

Ωx1

Component state spaceΩ2 Electricity state spaceΩU (i) Decision space for state i

States notations

W Working statePM Preventive maintenance stateCM Corrective maintenance state

913 Assumptions

bull The time span of the problem is T It is divided into N stages of length Tssuch that T = N middotTs The maintenance decision are made sequentially at eachstage k=01N-1

bull The failure rate of the component over the time is assumed perfectly knownThis function is denoted λ(t)

bull If the component fails during stage k corrective maintenance is undertakenfor NCM stages with a cost of CCM per stage

bull It is possible at each stage to decide to replace the component to preventcorrective maintenance The time of preventive replacement is NPM stageswith a cost of CPM per stage

bull If the system is not working a cost for interruption CI per stage is considered

bull The average production of the generating unit is G kW It means that if theunit is not in preventive maintenance or failure G middot Ts kWh are producedduring the stage (Ts in hours)

bull NE possible electricity price scenarios are considered The prices are supposedfixed during a stage (equal to the price at the beginning of scenario) Forscenario s the electricity price per kWh is noted CE(s k) k=01N-1 It ispossible that the electricity price switch from one scenario to another oneduring the time span The probability of transition at each stage is assumedknown

49

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 56: Models

bull A terminal cost (for stage N) can be used to penalize the terminal stagecondition

bull The manpower is assumed unlimited Spare parts are not considered

914 Model Description

9141 State Space

The state vector Xk is composed of two states variables x1k for the state of the

component (its age) and x2k for the electricity scenario NX = 2

The state of the system is thus represented by a vector as in (91)

Xk =

(x1k

x2k

)x1k isin Ωx1 x2

k isin Ωx2 (91)

Ωx1 is the set of possible states for the component and Ωx2 the set of possibleelectricity scenarios

Component state

The status of the component (its age) at each stage is represented by one statevariable x1

k There are three types of possible states for the variable Normalstate (W) when the component is working corrective maintenance (CM) states ifthe component is in maintenance due to failure and preventive maintenance (PM)states The meaning of a state is that the component has been in the corresponingcondition during the last stage For example if the component is in a state PMit means that during the last stage it has undertaken preventive maintenance Thenumber of CM and PM states for the component corresponds respectively to NCM

and NPM

To limit the size of the state space it is necessary to limit the number of states WIt can be assumed that when λ(t) reaches a fixed limit λmax = λ(Tmax) preventivemaintenance is always made Another possibility is to assume that λi(t) staysconstant when age Tmax is reached In this case Tmax can correspond for exampleat the time when λ(t) gt 50 if tgtTmax This approach was implemented Thecorresponding number of W states is NW = TmaxTs or the closest integer in bothcases

50

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 57: Models

CM2 CM1

W0 W1 W2 W3 W4

PM1

(1minus Tsλ(0)) (1minus Tsλ(1)) (1minus Tsλ(2)) (1minus Tsλ(3))

Tsλ(0) Tsλ(1) Tsλ(2) Tsλ(3) Tsλ(4)

(1minus Tsλ(4))

1

1

1

1 1 1 1 1

Figure 91 Example of Markov Decision Process for one component withNCM = 3NPM = 2 NW = 4 Solid line u=0 Dashed Line u=1

Figure 91 shows an example of graphical representation of the MDP model for onecomponent In this example x1

k isin Ωx1

= W0 W4 PM1 CM1 CM2 The StateW0 is used to represent a new component PM2 and CM3 are both representedwith this state

More generally

Ωx1

= W0 WNW PM1 PMNPMminus1 CM1 CMNCMminus1

51

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 58: Models

Electricity scenario state

Electricity scenarios are associated with one state variable x2k There areNE possible

states for this variable each state corresponding to one possible electricity scenariox2k isin Ωx

2

= S1 SNe The electricity price of the scenario S at stage k is givenby the electricity price function CE(S k) Figure 92 shows an example for threepossibles scenarios

The example considers three electricity scenarios correspond to high medium andlow electricity prices (respectively dry normal and wet year) The weather duringthe season influence the water reserve in a country as Sweden Hydropower is alarge part of the electricity generation in Sweden Moreover this is a cheap sourceof energy In consequence if there is a low water reserve more expensive source ofenergy are needed and the electricity price is higher

13

13

13

Stage

Electricity Prices SEKMWh

Scenario 1

Scenario 2

Scenario 3

k-1 k k+1

200

250

300

350

400

450

500

Figure 92 Example of electricity scenarios NE = 3

52

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 59: Models

9142 Decision Space

At each stage the decision maker can decide if the component is not in maintenanceto do preventive maintenance or not depending on the state X of the system

Uk = 0 no preventive maintenance

Uk = 1 preventive maintenance

The decision space depends only on the component state i1

ΩU (i) =

0 1 if i1 isin W1 WNW

empty else

9143 Transition Probabilities

The two state variables are independant Moreover only the electricity state tran-sitions depend on the stage Consequently

P (Xk+1 = j | Uk = uXk = i)

= P (x1k+1 = j1 x2

k+1 = j2 | uk = u x1k = i1 x2 = i2)

= P (x1k+1 = j1 | uk = u x1

k = i1) middot P (x2k+1 = j2 | x2

k = i2)

= P (j1 u i1) middot Pk(j2 i2)

Component state transition probability

At each stage k if the state of the component is Wq the failure rate is assumedconstant during the time of the stage and equal to λ(Wq) = λ(q middot Ts)

The transition probability for the component state is stationary It can be repre-sented as a Markov decision process as in the example in Figure 91

Table 91 summarizes the transition porbabilities that not equal to zero

Note that if NPM = 1 or NCM = 1 then PM1 respectively CM1 correspond to W0

Electricity State

The transition probabilities of the electricity state Pk(j2 i2) are not stationary

They can change from stage to stage 9143 with 93 give an example of transitionprobabilities for the electricity scenarios on a 12 stages horizon In this examplePk(j

2 i2) can take three different values defined by the transition matrices P 1E P 2

E

or P 3E i2 is represented by the rows of the matrices and j2 by the column

53

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 60: Models

Table 91 Transition probabilities

i1 u j1 P (j1 u i1)

Wq q isin 0 NW minus 1 0 Wq+1 1minus λ(Wq)Wq q isin 0 NW minus 1 0 CM1 λ(Wq)WNW 0 WNW 1minus λ(WNW )WNW 0 CM1 λ(WNW )Wq q isin 0 NW 1 PM1 1

PMq q isin 1 NPM minus 2 empty PMq+1 1PMNPMminus1 empty W0 1

CMq q isin 1 NCM minus 2 empty CMq+1 1CMNCMminus1 empty W0 1

Table 92 Example of transition matrix for electricity scenarios

P 1E =

1 0 00 1 00 0 1

P 2

E =

13 13 1313 13 1313 13 13

P 3

E =

06 02 0202 06 0202 02 06

Table 93 Example of transition probabilities on a 12 stages horizon

Stage(k) 0 1 2 3 4 5 6 7 8 9 10 11

Pk(j2 i2) P 1

E P 1E P 1

E P 3E P 3

E P 2E P 2

E P 2E P 3

E P 1E P 1

E P 1E

9144 Cost Function

The costs associated to the possible transitions can be of different kinds

bull Reward for electricity generation= G middotTs middotCE(i2 k) (depends on the electricityscenario state i2 and the stage k)

bull Cost for maintenance CCM or CPM

bull Cost for interruption CI

Moreover a terminal cost noted CN could be used to penalized deviations fromrequired state at the end of time horizon This option and its consequences was notstudied in this work The transition cost are summarized in Table 94 Notice thati2 is a state variable

A possible terminal cost is defined by CN (i) for each possible terminal state CN (i)for the component

54

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 61: Models

Table 94 Transition costs

i1 u j1 Ck(j u i)

Wq q isin 0 NW minus 1 0 Wq+1 G middot Ts middot Cel(i2 k)

Wq q isin 0 NW minus 1 0 CM1 CI + CCM

WNW 0 WNW G middot Ts middot CE(i2 k)WNW 0 CM1 CI + CCM

Wq 1 PM1 CI + CPM

PMq q isin 1 NPM minus 2 empty PMq+1 CI + CPM

PMNPMminus1 empty W0 CI + CPM

CMq q isin 1 NCM minus 2 empty CMq+1 CI + CCM

CMNCMminus1 empty W0 CI + CCM

92 Multi-Component model

In this section the model presented in Section 91 is extended to multi-componentssystems

921 Idea of the Model

The motivation for a multi-component model is to consider possible opportunisticmaintenance It is sometimes possible to do maintenance on different parts of thesystem at opportunistic times For example if the system fails it could be profitableto do maintenance on some components of the system that are still working butshould be maintained soon

This could be very interesting if the interruption cost is high or if the structureneeded for the maintenance is very high In wind power for example for certainmaintenance actions an helicopter or a boat can be necessary The price for theirrent can be very high and it could be profitable to group the maintenance of differentwind turbines at the same time

922 Notations for the Proposed Model

Numbers

NC Number of componentNWc Number of working state for component cNPMc Number of Preventive Maintenance state for component cNCMc Number of Corrective Maintenance state for component c

55

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 62: Models

Costs

CPMc Cost per stage of Preventive Maintenance for component cCCMc Cost per stage of Corrective Maintenance for component cCNc (i) Terminal cost if the component c is in state i

Variables

ic c isin 1 NC State of component c at the actual stageiNC+1 State for the electricity at the actual stagejc c isin 1 NC State of component c for the next stagejNC+1 State for the electricity for the next stageuc c isin 1 NC Decision variable for component c

State and Control Space

xck c isin 1 NC State of the component c at stage kxc A component state

xNC+1k Electricity state at stage kuck Maintenance for component c at stage k

Probability functions

λc(i) Failure probability function for component c

Sets

Ωxc

State space for component c

ΩxNC+1

Electricity state spaceΩuc

(ic) Decision space for component c in state ic

923 Assumptions

bull The system is composed of NC components in series If one component failsthe whole system fails

bull The failure rate of each component over the time is assumed perfectly knownThis function is noted λc(t) for component c isin 1 NC

bull If component c fails during stage k corrective maintenance is undertaken forNCMc stages with a cost of CCMc per stage

bull It is possible at each stage to decide to replace a component to prevent cor-rective maintenance The time of preventive replacement for component n isNPMc stages with a cost of CPMc per stage

56

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 63: Models

bull An interruption cost CI is consider whatever the maintenance is done on thesystem

bull The average production of the generating unit is G kW If none of the compo-nent of the unit is in preventive maintenance or failure G middotTs kWh is producedduring the stage (Ts in hours)

bull A terminal cost CNc can be used to penalize the terminal stage condition forcomponent c

924 Model Description

9241 State Space

The state of the system can be represented by a vector as in (92)

Xk =

x1k

xNckxNc+1k

(92)

xck c isin 1 NC represent the state of component c

xNc+1k represents the electricity state

Component SpaceThe number of CM and PM states for component c corresponds respectively toNCMc and NPMc The number of W states for each component c NWc is decided inthe same way that for one component

The state space related to the component c is noted Ωxc

xck isin Ωxc

= W0 WNWc PM1 PMNPMc minus1 CM1 CMNCMc minus1

Electricity SpaceSame as in Section 81

9242 Decision Space

At each stage the decision maker must decide for each component that is not inmaintenance to do preventive maintenance or do nothing depending on the stateof the system

57

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 64: Models

uck = 0 no preventive maintenance on component n

uck = 1 preventive maintenance on component n

The decision variables constitute a decision vector

Uk =

u1k

u2k

uNck

(93)

The decision space for each decision variable can be defined by

forallc isin 1 Nc Ωuc

(ic) =

0 1 if ic isin W0 WNWc

empty else

9243 Transition Probability

The state variables xc are independent of the electricity state xNc+1 Consequently

P (Xk+1 = j | Uk = UXk = i) (94)

= P ((j1 jNC ) (u1 uNC ) (i1 iNC )) middot P (jNC+1 jNC+1) (95)

The probabilities transition of the electricity states P (jNC+1 iNC+1) are similarto the one-component model They can be defined at each stage k by a transitionmatrices as in the example of Section 81

Component states transitions

The state variables xc are not independent of each other Indeed if one componentfails or is in maintenance the components are not ageing since the system is notworking In consequence different cases must be considered

Case 1

If all the component are working no maintenance is done the propability transitionof the whole system is the product of the probability transition of each componentconsidered independently

If forallc isin 1 NC yck isin W1 WNWn

P ((j1 jNC ) 0 (i1 iNC )) =NCprod

c=1

P (ic 0 jc)

Case 2

58

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 65: Models

If one of the component is in maintenance or the decision of preventive maintenanceis

P ((j1 jNC ) (u1 uNC ) (i1 iNC )) =NCprod

n=1

P c

with P c =

P (jc 1 ic) if uc = 1 or ic 6isin W1 WNWc

1 if ic 6isin W0 WNWc minus1 and ic = jc

0 else

9244 Cost Function

As for the transition probabilities there are 2 cases

Case 1If all the components are working no maintenance is decided and no failure happensa reward for the electricity produced is obtained

If forallc isin 1 NC yck isin W1 WNWn

C((j1 jNC ) 0 (i1 iNC )) = G middot Ts middot CE(iNC+1 k)

Case 2When the system is in maintenance or fails during the stage an interruption costCI is considered as well as the sum of all the maintenance actions

C((j1 jNC ) (u1 uNC ) (i1 iNC )) = C(I) +NCsum

c=1

Cc

with Cc =

CCMc if ic isin CM1 CMNCMc or jc = CM1

CPMc if ic isin PM1 PMNPMc or jn = PM1

0 else

93 Possible Extensions

The model could be extended in several directions The following list summarizessome ideas on issues that could impact on the model

bull Manpower It would be interesting to limit the number of maintenance actionspossible to do at the same time A solution would be to consider a globaldecision space and not individual decision space for each component statevariable

59

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 66: Models

bull Include other types of maintenance actions In the model replacement wasthe only maintenance action possible In reality there are a lot of possiblemaintenance actions such as minor repair major repair etc They could bemodelled by adding possible maintenance decisions in the model

bull Time to repair is non deterministic So that it is possible to model a stochasticreparation time by adding probabilities transition for the maintenance states

bull Use of deterioration states If monitoring or inspection of some componentsare possible deterioration state variables could be included in the model

bull Other forecasting states It could be interesting to add other forecasting stateinformation such as weather andor load states

60

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 67: Models

Chapter 10

Conclusions and Future Work

This thesis has reviewed models and methods based on Stochastic Dynamic Pro-gramming (SDP) and their application to maintenance problems

The theory of Dynamic Programming was introduced with finite horizon and infi-nite horizon stochastic approaches as well as Approximate Dynamic Programming(Reinforcement Learning) methods to solve infinite horizon SDP models A com-parison of the methods available for infinite horizon SDP was made Problems witha limited state space can be solved exactly The Policy Iteration algorithm is provedempirically to converge the faster However for high discount rate the Value Iter-ation algorithm can be better Linear Programming can also be used if additionalconstraints need to be included in the model Approximate Dynamic Programmingmethods are necessary for large state space

A maintenance model based on finite horizon Stochastic Dynamic Programmingwas proposed to illustrate the theory An interesting idea of the model was toenable opportunistic maintenance Different ideas of state variables and possibleextensions was also proposed

A literature review of Dynamic Programming application to maintenance optimiza-tion was made Finite horizon deterministic and stochastic dynamic programminghave been mainly applied to short term maintenance scheduling The idea of group-ing maintenance activities on a finite horizon seems promising to avoid untractablemodels Markov Decision Processes (MDP) and Semi-Markov Decision Processes(SMDP) is proposed in many articles to optimize maintenance decision based oncondition monitoring systems The advantage of SMDP is to be able to optimizethe next time to maintenance depending on the actual state of the system Onlysingle state variable models have been found in the literature for both MDP andSMDP No application of Approximate Dynamic Programming (ADP) has not beenfound in the literature but a proposition of application

61

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 68: Models

The main limitation of Dynamic Programming is related to the curse of dimension-nality The time complexity increases exponentionnaly with the number of statevariables in the model With the new advances in ADP methods this limitationcould be overcome No application of ADP was found in the litterature Themethods have been mainly applied to optimal control until now but their is newopportunities for applying them to new fields such as maintenance optimizationThe condition based maintenance models proposed using MDP or SMDP may beeg generalized to multi-variables models where different parameters of a systemare monitored

In the power industry maintenance contracts for a finite time is common In thisperspective maintenance optimization should focus on finite horizon models How-ever in the litterature few finite horizon models are proposed Two ways of usingDynamic Programming for finite horizon models are possible Either directly a finitehorizon model or with a discounted infinite horizon model which is an approximatefinite horizon model that must be stationnary over the time

An idea could be to extend the finite horizon model proposed in this thesis MarkovDecision Process and reinforcement learning could be applied to single-componentsmonitoring (with possible monitoring of multi-parameters) while the finite approachcould use the results from the single-components models to optimize the mainte-nance of a complete system The component in the finite horizon model could besimplified to a few number of possible deteriorationage states to limit the com-plexity of the model

62

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 69: Models

Appendix A

Solution of the Shortest Path

Example

Solution of the shortest path problem with the value iteration algorithmStage 4Jlowast(4 0) = φ(0) = 0Stage 3Jlowast3 (0) = Jlowast(H) = C(3 0 0) = 4 ulowast3(0) = ulowast(H) = 0Jlowast3 (1) = Jlowast(I) = C(3 1 0) = 2 ulowast3(1) = ulowast(I) = 0Jlowast3 (2) = Jlowast(J) = C(3 2 0) = 7 ulowast3(2) = ulowast(J) = 0Stage 2Jlowast2 (0) = Jlowast(E) = min Jlowast3 (0) + C(2 0 0) Jlowast3 (1) + C(2 0 1) = min 4 + 2 2 + 5 = 6ulowast2(0) = Jlowast(E) = argminuisin01 J

lowast3 (0) + C(0 0) Jlowast3 (1) + C(1 0) = 0

Jlowast2 (1) = Jlowast(F ) = min Jlowast(3 0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = min 4 + 7 2 + 3 7 + 2 = 5ulowast2(1) = Jlowast(F ) = argminuisin012 J

lowast3 (0) + C(2 1 0) Jlowast3 (1) + C(2 1 1) Jlowast3 (2) + C(2 1 2) = 2

Jlowast2 (2) = Jlowast(G) = min Jlowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = min 2 + 1 7 + 2 = 3ulowast2(2) = Jlowast(G) = argminuisin12 J

lowast3 (1) + C(2 2 1) Jlowast3 (2) + C(2 2 2) = 1

Stage 1Jlowast1 (0) = Jlowast(B) = min Jlowast2 (0) + C(1 0 0) Jlowast2 (1) + C(1 0 1) = min 6 + 4 5 + 6 = 10ulowast1(0) = Jlowast(B) = argminuisin01 J

lowast2(0) + C(1 0 0) Jlowast2 (1) + C(1 1 0) = 0Jlowast1 (1) = Jlowast(C) = min Jlowast2 (0) + C(1 1 0) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = min 6 + 2 5 + 1 3 + 3 = 6ulowast1(1) = Jlowast(C) = argminuisin012 J

lowast2 (0) + C(1 1 1) Jlowast2 (1) + C(1 1 1) Jlowast2 (2) + C(1 1 2) = 1 or 2

Jlowast1 (2) = Jlowast(D) = min Jlowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = min 5 + 5 3 + 2 = 5ulowast1(2) = Jlowast(D) = argminuisin12 J

lowast2 (1) + C(1 2 1) Jlowast2 (2) + C(1 2 2) = 2

Stage 0Jlowast0 (0) = Jlowast(A) = min Jlowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = min 10 + 2 6 + 4 5 + 3 = 8ulowast0(0) = Jlowast(A) = argminuisin012 J

lowast1 (0) + C(0 0 0) Jlowast1 (1) + C(0 0 1) Jlowast1 (2) + C(0 0 2) = 2

63

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 70: Models

Reference List

[1] Maintenance terminology Svensk Standard SS-EN 13306 SIS 2001

[2] Mohamed A-H Inspection maintenance and replacement models ComputOper Res 22(4)435ndash441 1995

[3] SV Amari and LH Pham Cost-effective condition-based maintenance usingmarkov decision processes Reliability and Maintainability Symposium 2006RAMSrsquo06 Annual pages 464ndash469 2006

[4] N Andreacuteasson Optimisation of opportunistic replacement activities in deter-ministic and stochastic multi-component systems Technical report ChalmersGoumlteborg University 2004 Licentiate Thesis

[5] YW Archibald and R Dekker Modified block-replacement for multiple-component systems IEEE Transactions on Reliability 45(1)75ndash83 1996

[6] I Bagai and K Jain Improvement deterioration and optimal replacementunderage-replacement with minimal repair IEEE Transactions on Reliability43(1)156ndash162 1994

[7] R E Barlow and F Proschan Mathematical Theory of Reliability Wiley1965

[8] R Bellman Dynamic Programming Princeton University Press Princeton1957

[9] C Berenguer C Chu and A Grall Inspection and maintenance planning anapplication of semi-Markov decision processes Journal of Intelligent Manufac-turing 8(5)467ndash476 1997

[10] M Berg and B Epstein A modified block replacement policy Naval ResearchLogistics Quarterly 2315ndash24 1976

[11] M Berg and B Epstein A note on a modified block replacement policy for unitswith increasing marginal running costs Naval Research Logistics Quarterly26157ndash179 1979

65

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 71: Models

[12] L Bertling R Allan and R Eriksson A reliability-centered asset maintenancemethod for assessing the impact of maintenance in power distribution systemsIEEE Transactions on Power Systems 20(1)75ndash82 2005

[13] D P Bertsekas and J N Tsitsiklis Neuro-Dynamic Programming AthenaScientific 1996

[14] GK Chan and S Asgarpoor Optimum maintenance policy with Markov pro-cesses Electric Power Systems Research 76(6-7)452ndash456 2006

[15] DI Cho and M Parlar A survey of maintenance models for multi-unit systemsEuropean journal of operational research 51(1)1ndash23 1991

[16] R Dekker RE Wildeman and FA van der Duyn Schouten A review ofmulti-component maintenance models with economic dependence Mathemat-ical Methods of Operations Research (ZOR) 45(3)411ndash435 1997

[17] B Fox Age Replacement with Discounting Operations Research 14(3)533ndash537 1966

[18] C Fu L Ye Y Liu R Yu B Iung Y Cheng and Y Zeng Predictive mainte-nance in intelligent-control-maintenance-management system for hydroelectricgenerating unit IEEE Transactions on Energy Conversion 19(1)179ndash1862004

[19] A Haurie and P LrsquoEcuyer A stochastic control approach to group preventivereplacement in a multicomponent system IEEE Transactions on AutomaticControl 27(2)387ndash393 1982

[20] P Hilber and L Bertling Monetary importance of component reliability inelectrical networks for maintenance optimization In Probabilistic Methods Ap-plied to Power Systems 2004 International Conference on pages 150ndash155September 2004

[21] A Jayakumar and S Asgarpoor Maintenance optimization of equipment bylinear programming In Probabilistic Methods Applied to Power Systems 2004International Conference on pages 145ndash149 2004

[22] Y Jiang Z Zhong J McCalley and TV Voorhis Risk-based MaintenanceOptimization for Transmission Equipment Proc of 12th Annual SubstationsEquipment Diagnostics Conference 2004

[23] L P Kaelbling M L Littman and A P Moore Reinforcement learning Asurvey Journal of Artificial Intelligence Research 4237ndash285 1996

[24] D Kalles A Stathaki and RE Kingm Intelligent monitoring and mainte-nance of power plants In Workshop on laquoMachine learning applications in theelectric power industryraquo Chania Greece 1999

66

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 72: Models

[25] D Kumar and U Westberg Maintenance scheduling under age replacementpolicy using proportional hazards model and TTT-plotting European Journalof Operational Research 99(3)507ndash515 1997

[26] P LrsquoEcuyer and A Haurie Preventive replacement for multicomponent sys-tems An opportunistic discrete time dynamic programming model IEEETransactions on Automatic Control 32117ndash118 1983

[27] M Lehtonen On the optimal strategies of condition monitoring and mainte-nance allocation in distribution systems In Probabilistic Methods Applied toPower Systems 2006 PMAPS 2006 International Conference on pages 1ndash52006

[28] ML Littman Algorithms for Sequential Decision Making PhD thesis BrownUniversity 1996

[29] Y Mansour and S Singh On the complexity of policy iteration Uncertaintyin Artificial Intelligence 99 1999

[30] MKC Marwali and SM Shahidehpour Short-term transmission line main-tenance scheduling in a deregulated system Power Industry Computer Ap-plications 1999 PICArsquo99 Proceedings of the 21st 1999 IEEE InternationalConference pages 31ndash37 1999

[31] RP Nicolai and R Dekker Optimal maintenance of multi-component systemsa review 2006

[32] J Nilsson and L Bertling Maintenance management of wind power systemsusing condition monitoring systems-life cycle cost analysis for two case studiesIEEE Transaction on Energy Conversion 22(1)223ndash229 2007

[33] Julia Nilsson Maintenance management of wind power systems - cost effectanalysis of condition monitoring systems Masterrsquos thesis Royal Institute ofTechnology (KTH) April 2006

[34] KS Park Optimal wear-limit replacement with wear-dependent failures IEEETransactions on Reliability 37(3)293ndash294 1988

[35] KS Park Condition-based predictive maintenance by multiple logisticfunc-tion IEEE Transactions on Reliability 42(4)556ndash560 1993

[36] Martin L Puterman Markov Decision Processes Discrete Stochastic DynamicProgramming John Wiley amp Sons Inc 1994

[37] A Rajabi-Ghahnavie and M Fotuhi-Firuzabad Application of markov decisionprocess in generating units maintenance scheduling In Probabilistic MethodsApplied to Power Systems 2006 PMAPS 2006 International Conference onpages 1ndash6 2006

67

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List
Page 73: Models

[38] Rangan Alagar Ahyagarajan Dimple and Sarada Optimal replacement ofsystems subject to shocks and random threshold failure International Journalof Quality amp Reliability Management 231176ndash1191 2006

[39] J Ribrant and L M Bertling Survey of failures in wind power systems withfocus on swedish wind power plants during 1997-2005 IEEE Transaction onEnergy Conversion 22(1)167ndash173 2007

[40] J Si Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE 2004

[41] Richard S Sutton and Andrew G Barto Reinforcement Learning An Intro-duction MIT Press 1998

[42] CL Tomasevicz and S Asgarpoor Optimum maintenance policy using semi-markov decision processes In Power Symposium 2006 NAPS 2006 38thNorth American pages 23ndash28 2006

[43] H Wang A survey of maintenance policies of deteriorating systems EuropeanJournal of Operational Research 139(3)469ndash489 2002

[44] L Wang J Chu W Mao and Y Fu Advanced maintenance strategy forpower plants - introducing intelligent maintenance system In Intelligent Con-trol and Automation 2006 WCICA 2006 The Sixth World Congress on vol-ume 2 2006

[45] R Wildeman R Dekker and A Smit A dynamic policy for grouping main-tenance activities European Journal of Operational Research

[46] RE Wildeman R Dekker and A Smit A Dynamic Policy for GroupingMaintenance Activities Econometric Institute 1995

[47] Otto Wilhelmsson Evaluation of the introduction of RCM for hydro powergenerators at vattenfall vattenkraft Masterrsquos thesis Royal Institute of Tech-nology (KTH) May 2005

68

  • Contents
  • Introduction
    • Background
    • Objective
    • Approach
    • Outline
      • Maintenance
        • Types of Maintenance
        • Maintenance Optimization Models
          • Introduction to the Power System
            • Power System Presentation
            • Costs
            • Main Constraints
              • Introduction to Dynamic Programming
                • Introduction
                • Deterministic Dynamic Programming
                  • Finite Horizon Models
                    • Problem Formulation
                    • Optimality Equation
                    • Value Iteration Method
                    • The Curse of Dimensionality
                    • Ideas for a Maintenance Optimization Model
                      • Infinite Horizon Models - Markov Decision Processes
                        • Problem Formulation
                        • Optimality Equations
                        • Value Iteration
                        • The Policy Iteration Algorithm
                        • Modified Policy Iteration
                        • Average Cost-to-go Problems
                        • Linear Programming
                        • Efficiency of the Algorithms
                        • Semi-Markov Decision Process
                          • Approximate Methods for Markov Decision Process - Reinforcement Learning
                            • Introduction
                            • Direct Learning
                            • Indirect Learning
                            • Supervised Learning
                              • Review of Models for Maintenance Optimization
                                • Finite Horizon Dynamic Programming
                                • Infinite Horizon Stochastic Models
                                • Reinforcement Learning
                                • Conclusions
                                  • A Proposed Finite Horizon Replacement Model
                                    • One-Component Model
                                    • Multi-Component model
                                    • Possible Extensions
                                      • Conclusions and Future Work
                                      • Solution of the Shortest Path Example
                                      • Reference List