Proceedings of the First European Conference of the Prognostics ...

www.phmsociety.org ISBN - 978-1-936263-04-2

Proceedings of

First European Conference of the

Prognostics and Health Management

Society 2012

PHM-E’12

ISBN - 978-1-936263-04-2

Dresden, Germany

July 3 - 5, 2012

Edited by: Anibal Bregon Abhinav Saxena

First European Conference of the Prognostics and Health Management Society, 2012

ii


Table of ContentsFull PapersA distributed Architecture to implement a Prognostic Function for Complex SystemsXavier Desforges, Mickaël Diévart, Philippe Charbonnaud, and Bernard Archimède 2

An Approach to the Health Monitoring of a Pumping Unit in an Aircraft Engine Fuel SystemBenjamin Lamoureux, Jean-Rémi Massé, and Nazih Mechbal 10

Application of Microwave Sensing to Blade Health MonitoringDavid Kwapisz, Michaël Hafner, and Ravi Rajamani 17

Assessment of Remaining Useful Life of Power Plant Steam Generators - a Standardized Industrial ApplicationUlrich Kunze and Stefan Raab 25

Autonomous Prognostics and Health Management (APHM)Jacek Stecki, Joshua Cross, Chris Stecki, and Andrew Lucas 34

Characterization of prognosis methods: an industrial approachJayant Sen Gupta, Christian Trinquier, Ariane Lorton, and Vincent Feuillard 42

Damage identification and external effects removal for roller bearing diagnosticsM. Pirra, A. Fasana, L. Garibaldi, and S. Marchesiello 51

Data Management Backbone for Embedded and PC-based Systems Using OSA-CBM and OSA-EAIAndreas Löhr, Conor Haines, and Matthias Buderath 59

Designing Data-Driven Battery Prognostic Approaches for Variable Loading Profiles: Some Lessons LearnedAbhinav Saxena, José R. Celaya, Indranil Roychoudhury, Sankalita Saha, Bhaskar Saha, and Kai Goebel 69

Diagnostics Driven PHM. The Balanced SolutionJim Lauffer 80

Fatigue Crack Growth Prognostics by Particle Filtering and Ensemble Neural NetworksPiero Baraldi, Michele Compare, Sergio Sauco, and Enrico Zio 90

Feature Extraction and Evaluation for Health Assessment and Failure PrognosticsK. Medjaher, F. Camci, and N. Zerhouni 98

Finite Element based Bayesian Particle Filtering for the estimation of crack damage evolution on metallic panelsSbarufatti C., Corbetta M., Manes A., and Giglio M. 104

Health Assessment and Prognostics of Automotive ClutchesAgusmian Partogi Ompusunggu, Steve Vandenplas, Paul Sas, and Hendrik Van Brussel 114

Health management system for the pantographs of tilting trainsGiovanni Jacazio, Massimo Sorli, Danilo Bolognese, Davide Ferrara 127

Lifetime models for remaining useful life estimation with randomly distributed failure thresholdsBent Helge Nystad, Giulio Gola, and John Einar Hulsund 141

Major Challenges in Prognostics: Study on Benchmarking Prognostics DatasetsO. F. Eker, F. Camci, and I. K. Jennions 148

Physics Based Electrolytic Capacitor Degradation Models for Prognostic Studies under Thermal OverstressChetan S. Kulkarni, José R. Celaya, Kai Goebel, and Gautam Biswas 156

Prediction of Fatigue Crack Growth in Airframe StructuresJindrich Finda, Andrew Vechart, and Radek Hédl 165

Simulation Framework and Certification Guidance for Condition Monitoring and Prognostic Health ManagementMatthias Buderath and Partha Pratim Adhikari 172

Theoretical and Experimental Evaluation of a Real-Time Corrosion Monitoring System for Measuring Pitting in AircraftStructuresDouglas Brown, Duane Darr, Jefferey Morse, and Bernard Laskowski 183

iii


Uncertainty of performance requirements for IVHM tools according to business targetsManuel Esperon-Miguez, Philip John, and Ian K. Jennions 192Unscented Kalman Filter with Gaussian Process Degradation Model for Bearing Fault PrognosisChristoph Anger, Robert Schrader, and Uwe Klingauf 202Using structural decomposition methods to design gray-box models for fault diagnosis of complex industrial systems: abeet sugar factory case studyBelarmino Pulido, Jesus Maria Zamarreño, Alejandro Merino, and Anibal Bregon 214Virtual Framework for Validation and Verification of System Design Requirements to enable Condition Based Mainte-nanceHeiko Mikat, Antonino Marco Siddiolo, and Matthias Buderath 225

Poster PapersAnalyzing Imbalance in a 24 MW Steam TurbineAfshin DaghighiAsli, Vahid Rezaie, and Leila Hayati 240Economic reasoning for Asset Health Management Systems in volatile marketsKatja Gutsche 244System PHM Algorithm MaturationJean-Rémi Massé, Ouadie Hmad, and Xavier Boulet 250Design for Availability - Flexible System Evaluation with a Model Library of Generic RAMST BlocksDieter Fasol and Burkhard Münker 256Knowledge-Based System to Support Plug Load ManagementJonny Carlos da Silva and Scott Poll 258Integrated Vehicle Health Management and Unmanned AviationAndrew Heaton, Ip-Shing Fan, Craig Lawson, and Jim McFeat 265

Author Index 267

iv

Full Papers

A distributed Architecture to implement a Prognostic Function for

Complex Systems

Xavier Desforges1, Mickaël Diévart

2, Philippe Charbonnaud

1 and Bernard Archimède

1

1Univeristé de Toulouse, INPT, ENIT, Laboratoire Génie de Production, 65016 Tarbes, France

[email protected]

[email protected]

[email protected]

2Aéroconseil, 31703 Blagnac, France

[email protected]

ABSTRACT

The proactivity in maintenance management is improved by

the implementation of CBM (Condition-Based

Maintenance) and of PHM (Prognostic and Health

Management). These implementations use data about the

health status of the systems. Among them, prognostic data

make it possible to evaluate the future health of the systems.

The Remaining Useful Lifetimes (RULs) of the components

is frequently required to prognose systems. However, the

availability of complex systems for productive tasks is often

expressed in terms of RULs of functions and/or subsystems;

those RULs provide information about the components.

Indeed, the maintenance operators must know what

components need maintenance actions in order to increase

the RULs of the functions or subsystems, and consequently

the availability of the complex systems. This paper aims at

defining a generic prognostic function of complex systems

aiming at prognosing its subsystems, functions and at

enabling the isolation of components that needs

maintenance actions. The proposed function requires

knowledge about the system to be prognosed. The

corresponding models are detailed. The proposed prognostic

function contains graph traversal so its distribution is

proposed to increase the calculation speed. It is carried out

by generic agents.

1. INTRODUCTION

The implementation of the Condition-Based Maintenance

(CBM) recommendations usually leads to the improvement

of the equipment availability (Jardine, Lin and Banjevic,

2006; Scarf, 2007). The CBM actions are planned and led

according to the health status of equipments. Monitoring,

diagnostic and prognostic functions assess these statuses.

The development of health assessment functions has often

been considered as a downstream activity in the design

process of complex systems with few allocated means. This

has often led to a lack of collaboration with upstream

activities and to centralized deployment in light

computational modules although those functions have to

process numerous pieces of data of different kinds. The

consequences are increasing rates of useless replacements of

devices, with the increasing complexity of the systems.

Those replacements are not only costly but may also cause

additional damage to the system.

Therefore, health assessment functions now become a major

issue for complex system designers. Among those functions,

the prognostic function aims at defining the future health of

the system that contributes to plan productive tasks or

maintenance tasks. Among the difficulties leading to the

implementation of prognostic functions in complex systems,

there are the numerous hardware or software components,

devices, functions or subsystems of complex systems. Those

equipments are designed, manufactured, assembled by

different industrial partners (OEMs, suppliers,

subcontractors, etc.). Each partner has a part of the needed

knowledge to carry out the prognosis of the complex system.

However, some pieces of this knowledge are parts of the

own know-how of the partners and so they cannot be shared.

To tackle this difficulty, a decentralized/distributed

architecture can be proposed. Indeed, such architectures

enable the implementation of the Remaining Useful

Lifetime (RUL) assessment and prognostic functions closer

to components, devices, functions or subsystems. Therefore,

each OEMs, suppliers or subcontractors can provide RUL

assessment and prognostic functions for their equipments.

Nevertheless, those functions have to collaborate in order to

ensure the convergence of the prognostic process of

_____________________

Xavier Desforges et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License,

which permits unrestricted use, distribution, and reproduction in any

medium, provided the original author and source are credited.


2

European Conference of Prognostics and Health Management Society 2012

2

complex systems. Indeed, the union of local prognoses is

not the global prognosis. To illustrate that point, let us

consider a system made of a power supply and a computer.

If the RUL of the power supply is lower than the one of the

computer, the computer will not probably be able to carry

out its activity beyond the RUL of the power supply. Agents

that carry out RUL assessment and prognostic function can

be used to ensure this collaboration. An agent is defined as a

self-contained problem-solving computational entity able, at

least, to perform most of its problem-solving tasks, to

interact with its environment, to perceive its environment

and to respond within a given time, to take initiatives when

it is appropriate (Jennings and Wooldridge, 1995).

The aim of this article is to present an architecture for

implementing a distributed prognostic function for complex

systems. Firstly, the interest of distributed prognostic

function is discussed. To implement prognostic function,

knowledge about the complex system is necessary. Then the

paper describes the principles of the prognostic function for

complex systems. The notion of Time before Out of order

(TBO) is introduced. Then the paper shows the proposed

architecture that is based on the multi-agent system concept

with generic agents and how to split up the modeled

knowledge between the agents.

2. DISTRIBUTED PROGNOSIS

One aim of the Prognostic and Health Management (PHM)

is to assess the ability of complex systems to carry out

future tasks from diagnostic and prognostic results and the

definition of the constraints of the future tasks. Roemer,

Byington, Kacprzynski and Vachtsevanos (2007) advise that

diagnostic and prognostic algorithm should be processed as

close as possible to the monitored components and that the

produced data should be then exploited by ascending the

hierarchical structure of the complex system. Therefore,

bringing the PHM into operation requires the

implementation of prognostic functions.

If Vachtsevanos and Wang (2001) consider that the

prognostic activity consists in assessing a RUL once an

early detection of failure have been made, Lebold and

Thurston (2001) consider that it is a reliable assessment of

the RUL of a system or a device. From these studies it

appears that the assessment of the RUL is the keystone of

the prognostic activity. Indeed, the data it provides are used

as decision support for maintenance planning and proactive

maintenance (Iung, Monnin, Voisin, Cocheteux and Levrat,

2008) or for e-maintenance (Muller, Crespo Marquez and

Iung, 2008).

Several studies have been led dealing with the design of

prognostic functions of devices. Several techniques are

described in (Vachtsevanos, Lewis, Roemer, Hess and Wu,

2006). Nevertheless, in the case of complex systems, the set

of the RULs of the devices may not be enough to be a

suitable decision support for maintenance or for productive

tasks planning purposes. The sets of RULs shall therefore be

processed. In complex systems, the number of RULs can be

so huge that the only reasonable way to process them is

distributed. Another good reason in using distributed

architecture is that it enables implementations of prognostic

processes as close as possible to the monitored devices as

Roemer et al. (2007) advise it. Works dealing with

distributed prognosis are quite recent and several ways to

distribute the prognostic processes were already proposed.

In (Voisin, Levrat, Cocheteux and Iung, 2010) the prognosis

is considered as a business process whose activities can be

distributed in a context of e-maintenance. The mentioned

distribution is made according to different actors located on

different sites.

Saha, Saha, and Goebel (2009) propose an architecture

made of several agents that can communicate between each

other. An agent diagnoses a device and when it detects a

fault it switches to the prognostic mode and informs a base

station. The base station plans tasks, can reinitialize the

processes of agents if errors are detected, it manages the

accesses to resources like the ones to an external database

and it manages the availability of agent in terms of

computation load.

Dragomir, Gouriveau, Zerhouni and Dragomir (2007)

present an architecture for health assessment that consists of

two levels: the local level corresponds to the components

and global level that is associated to the complex system. In

this architecture, each local agent brings into operation

several known prognostic methods according to the

available knowledge about the monitored component. The

global agent collects the health assessment data from the

local agents and computes a health assessment for the

system thanks to a neural network.

Takai and Kumar (2011) propose a decentralized prognoser

for discrete event systems where local agents generate

prognoses that are sent to the other agents. Then the agents

cooperate in order to converge to a prognosis of the system

thanks to an inference engine.

The sets of RULs shall also be processed according to

knowledge as mentioned in (Saha et al., 2009).

3. KNOWLEDGE MODELING

During the design stage of a complex system, different

kinds of knowledge are elaborated. Among them, the

structural knowledge, the functional knowledge and the

behavioral knowledge are required to implement prognostic

functions (Reiter, 1992; Chittaro and Ranon, 2003). HAZard

and OPerability (HAZOP) methodology, that is a process

hazard analysis technique, enable to study not only the

hazards of a system, but also its operability problems, by

exploring the effects of any deviations from design

conditions (Dunjo, Fthenakis, Vilchez and Arnaldos, 2010).


3


3

This methodology enables to identify functions and

interconnections.

3.1. The functional knowledge modeling

The functional knowledge modeling aims at providing the

sets of components that implements the functions of the

complex system from the users point of views. Knowing the

RUL of a function will help to plan future missions of the

system and/or the maintenance actions it needs.

Therefore, the functional knowledge modeling consists in

defining functions as sets of components or devices, which

we call all “devices”. Functions can also be made of

functions. Complex systems can also be divided into

subsystems. In that case, a subsystem can be considered as a

set of functions. Thus, a complex system is made of

subsystems. A subsystem is made of functions. A function

is made of devices and/or functions.

Three types of functions must be considered for the

computation of prognostic of functions.

Simple functions are functions that fail if one their entities

fails (devices or functions) at least.

For reliability purposes, complex systems contain functions

with redundancies. These functions are carried out by at

least two entities (devices and/or subfunctions) that bring

into operation the same activities, services... For example,

we suppose that a flight control function of an aircraft is

made of three functions we call “flight controllers”. If one

or even two flight controllers fail, the flight control can still

carry out its task. However, if two flight controllers fail,

there is no more redundancy. That is why we consider

redundancies as functions called redundancy functions.

Those functions are the only entities included in the

functions with redundancies.

Subsystems are considered as sets of functions that are not

included in other functions. Thus, subsystems can be

considered as simple functions. The prognostic of the

complex system can then be assessed from the prognostic of

its subsystems.

The modeling of this knowledge can be done thanks to a

UML (Unified Modeling Language), which is an object

oriented modeling language, class diagram shown in

figure 1.

3.2. The structural knowledge modeling

The structural modeling aims at representing the direct

interactions between devices and their failure modes mainly

in order to propagate the effects of failures (Worn, Langle,

Albert, Kazi, Brighenti, Revuelta Seijo, Senior, Sanz-Bobi

and Villar Collado, 2004).

Figure 1. Functional knowledge model.

Failure Modes and Effects Analysis (FMEA) or HAZOP

studies enable to collect the necessary knowledge for

structural modeling. Indeed, those studies enable to identify

what happens to other devices when one or several devices

fail.

Therefore the structural knowledge can be modeled thanks

to a set of arcs

with , where an arc

means that the device will be out of order (mode ) if

the failure mode of a device occurs. Let us note that

the mode can be the mode . However, some

particular cases exist. For example, a laptop uses a power

supply function. Let us simplify by assuming that the

battery and the electricity distribution network carry out this

function. If only the battery or the electricity distribution

network fails, the computer still operates normally. That is

why, the cases where a function fails or becomes out

order makes components become out of order must also be

considered. Thus, the structural model must also represent a

set of arcs with the same meaning

with

. So, the structural model consists of the sets S1 and

S2. Those sets of arcs represent a graph where the nodes are

the failure modes of the devices and the functions of the

complex system.

The out of order mode (Moo) is quite relevant because it

indicates that the origin of the predicted failure of an entity

is not the entity itself.

3.3. The behavioral knowledge modeling

The behavioral modeling mainly aims at defining the

dynamical behavior of a system. Behavioral models can be

used to detect degradation and to analyze their trends in

order to define the RUL of the monitored device.

The behavioral models used to prognose a device can be

achieved thanks to three approaches (Byington, Roemer,

Watson and Galie, 2003): experience-based, evolutionary

and/or statistical trending or model based. The implemented

behavioral models to prognose a complex system can so be

numerous and of various kinds that it is difficult to consider


4


4

all of them. They also require design knowledge of devices,

functions or subsystems that may reveal the know-how of

their providers. Nevertheless, the most important things are

what they contribute to produce: the RULs of the devices.

We then assume that a monitoring layer made of one or

several agents provides the RULs to the proposed

prognostic function. The monitoring layer agents can so

bring into operation the most suitable techniques to assess

RULs of devices.

3.4. RUL modeling

In order to be processed by a prognostic function, a RUL

has to assess a duration T between the instant t0, at which it

is calculated, and the predicted instant t0+T at which the

device will fail according to a given failure mode. Thus, a

RUL must contain four entities (Voisin et al., 2010) that are:

the involved device,

the involved failure or degraded mode,

the instant at which it was calculated,

the duration.

RULs are assessments, so fields can be added to deal with

uncertainty or confidence level. However, the proposed

prognostic function of complex system does not take into

account any kind of uncertainty representations. The aim of

this paper is to propose a principle for prognosing a

complex system that can be distributed into generic agents.

Handling uncertainty of RULs would likely lead to

implement different processes into the tasks described in

section 4.

The RULs that the monitoring layer provides are the base of

the proposed prognostic function for complex systems but

this function also needs the functional and structural

knowledge.

4. PROGNOSTIC FUNCTION FOR COMPLEX SYSTEMS

This section is dedicated to the proposed generic principle

for prognosing complex systems from RULs and from the

modeled functional and structural knowledge. We assume

that the monitoring layer sends to the Complex System

Prognostic Function (CSPF) each RUL that it computes for

each failure mode of the devices unless the out of order

mode. The CSPF is divided into three main tasks that are:

1. the computation of the RUL of the device for which a

RUL has been received,

2. the computation of the RUL of the devices that are

interconnected (directly or not) to the device for which

a RUL has been received,

3. the computation of the RUL of functions from the

former task,

The process of the CSPF starts when a RUL is received

from the monitoring layer.

4.1. Computation of the RUL of a device (task 1)

The RUL received by the CSPF from the monitoring layer is

noted RUL(Di, Mj, tk, Tl) where Di is the device, Mj is the

predicted failure mode, tk is the instant at which the RUL

was computed and Tl is remaining lifetime such as tk+Tl is

the predicted instant at which the failure will likely occur.

When a RUL RUL(Di, Mf, tk, Tl) is received at the instant t,

it is recorded and it replaces the last stored RUL for Di with

the failure mode Mj RUL(Di, Mf, tk-1, Tl-1) if tk > tk-1 else the

task stops. If tk > tk-1, the RUL of Di is then defined thanks to

its last recorded RULs for all its failure modes. These RULs

are noted RUL(Di, Mj, tkj, Tlj). The new RUL of Di becomes

RUL(Di, Mp, t, Tp) where p correspond to the failure mode

for which:

(1)

Then this RUL is compared to the last recorded RUL for the

device noted RUL(Di, Mq, tkq, Tlq) if , RUL(Di, Mp, t, Tp) becomes the new

RUL of the device Di. It is stored and replaces RUL(Di, Mq,

tkq, Tlq) and then, if at least one arc starts from the node

, the task 2 is processed else the task 3 is processed. If

RUL(Di, Mp, t, Tp) does not become the new RUL of the

device Di the CSPF stops.

4.2. Computation of the RUL of the devices that are

interconnected (task 2)

This task consists in propagating the new RUL(Di, Mp, t, Tp)

in the graph described by the arcs

where

the devices will likely be out of order earlier than previously

predicted because of this new RUL. We must here introduce

the notion of Time Before Out of order (TBO). This notion

explains that a device will likely become out of order

because of the failure Mp of the device Di. This notion is

meaningful for maintenance because it enables to localize

the devices for which maintenance actions will be

necessary. The TBO so contains five entities:

the involved device,

the device for which the RUL has generated the TBO,

the failure mode of the device for which the RUL has

generated the TBO,

the instant at which the TBO was computed,

the remaining time before the out of order mode occurs.

If the prognostic function handles uncertainty, TBOs must

also contain fields dealing with this notion. That is not the

case in this paper.

This second task does not consist of the computation of the

new RULs of the interconnected devices but of their new

TBOs. Two cases are considered:

one for the arcs

,


5


5

one for the arcs

.

For all the arcs

, RUL(Di, Mp, tkp, Tp) is

compared to the last recorded TBOs of the devices Dj for

which TBOs are noted TBO(Dj, Dx, Mqx, tkqx, Tqx) with j≠x.

This comparison is made at the instant t. If , then the new TBO of Dj becomes

TBO(Dj, Di, Mp, t, Tpt) with . This new

TBO is recorded and replaces the previous stored one and it

is propagated in the graph from the node otherwise

the propagation in the graph from the node is

stopped.

For all the other arcs

the TBO of the

device Dn noted TBO(Dn, Di, Mp, tpt, Tpt) is compared to the

last recorded TBO of Dm noted TBO(Dm, Dj, Mq, tqt, Tqt).

This comparison is made at the instant t. If , then the new TBO of Dm becomes

TBO(Dm, Di, Mp, t, T) with . This new

TBO is recorded and replaces the previous stored one and it

is propagated in the graph from the node otherwise

the propagation from the node is stopped.

This tasks ends when there is no more TBO to propagate.

Then the prognostic of the functions must be done by the

CSPF from the RULs and TBOs that were updated.

4.3. Computation of the RUL of the functions (task 3)

According to section 3.1, three types of functions must be

considered for the computation of their prognoses: simple

functions, functions with redundancies and redundancy

functions.

The failure mode of a function is directly linked to the

failure mode of one of its devices and/or to the missing

service carry out by one of its subfunctions. That is why we

only consider the TBOs of the functions instead of their

RULs. The TBO of a function contains the same fields as

the ones of the TBOs for devices except the involved

function instead of the involved device.

The TBO of a function is computed if, at least, one RUL of

one its devices has been modified or if one TBO of one of

its entities (devices or functions), noted X, has been

modified by the CSPF.

For a simple function Fj, if the RUL RUL(Di, Mp, tp, Tp) of

one of its device has been modified its TBO(Fj, Dk, Ml, tl, Tl)

is modified if is verified then it becomes

TBO(Fj, Di, Mp, t, T) with .

For a simple function Fj, if the TBO of one of its functions

or of its devices TBO(Xq, Di, Mp, tp, Tp), where Xq denotes

either the function or the device, has been modified its

TBO(Fj, Dk, Ml, tl, Tl) is modified if is

verified then it becomes TBO(Fj, Di, Mp, t, T) with

.

The new TBO is recorded and replace the previous stored

one.

For functions with redundancies, the TBOs and/or RULs of

their entities included in their redundancy functions are

considered. For an entity that is a device Di, we consider its

RUL RUL(Di, Mp, tp, Tp) or its TBO TBO(Di,, Dx, Mq, tq, Tq)

and the value Tti that is computed with the relationship (2):

(2)

If an entity is a function Fj with TBO(Fj,, Dx, Mq, tq, Tq), the

value Ttj is computed with the relationship (3):

(3)

The TBO(Fwr,, Dy, Ms, t, T) of a function with redundancies

is computed from (4):

(4)

where Dy and Ms are the device and its failure mode for

which the RUL or TBO that have the greatest value Tt and t

is the instant at which the TBO has been computed. The

new TBO is recorded and replace the previous stored one.

For a redundancy function, the TBOs and/or RULs of their

entities are considered. For an entity that is a device Di, we

consider its RUL(Di, Mp, tp, Tp) or TBO(Di,, Dx, Mq, tq, Tq)

and the value Tti, that is also computed with the relationship

(2). For a function entity Fj with TBO(Fj,, Dx, Mq, tq, Tq), the

value Ttj is computed with the relationship (3) too. The

TBO(Fr,, Dy, Ms, t, T) of a redundancy function is computed

from (5):

(5)

where Dy and Ms are the device and its failure mode for

which the RUL or TBO that have the nth greatest value Tt

and t is the instant at which the TBO has been computed

(generally n=2). The new TBO is recorded and replace the

previous stored one.

If the TBO of a function Fk has changed and if its linked to a

device by an arc

, which is in fact an arc

, the task 2 is then processed with the same

procedure as the one for arcs

.

The TBOs of the functions and RULs of the devices are the

elements of the prognostic.

4.4. Experimental results

The CSPF was successfully tested. In order to illustrate the

results it provides, we propose the case study of the figure 2

where the arcs represent the structural knowledge and the

boxes the functional knowledge.


6


6

Figure 2. Case study.

In this system, only one mode of failure is considered for

each device and the effect of this failure is supposed to be

the same as the out of order mode. That is why only one

kind of arcs is represented in Figure 2. However, one case of

system with devices having two failure modes with different

effects has also been successfully tested.

The table 1 shows an overview of the results provided by

the CSPF for a simulated scenario. In this table the first

column is the rank of reception of a RUL, the second

column is the identifier of the device for which the

monitoring layer has emitted a RUL, the third column is the

date (which is the sum t+T of the fields contained in the

received RUL) at which the device will probably fail. The

other columns of the table contain the dates (the sums t+T)

of the RULs or TBO of devices and functions that are

modified by the CSPF because of the received RUL. Dates

in red mean that the date (t+T) of RUL’s device is earlier

than the date of its TBO.

The proposed CSPF is processed on-line each time a new

RUL is received but it always leads to reduce the dates (t+T)

of the RULs and TBOs of devices and functions. Thus, it is

a pessimistic approach of the prognostic of the complex

system. In that case, we can consider that the prognostic

made on-line is dedicated to control operators. However, the

CSPF can be run off-line for maintenance operators. The T

values of the TBO must so be set to very great values. Then

the CSPF is then run for all the RULs of each device. Thus,

the maintenance operators have so indications about the

devices that need maintenance actions. Once a device have

been replaced or fixed, its RUL must be set at new values.

In such cases, the T value of the RUL of the replaced or

fixed device may be set to a value equal to its MTBF (Mean

Time Between Failures) or MTTF (Mean Time To Failure).

The T values of the TBO are then set to very great values.

Then the CSPF is then run for all the RULs of each device.

However, The CSPF requires graph traversal and it can so

be a long process. One way to reduce the computation time

is to distribute the CSPF.

5. DISTRIBUTION OF THE CSPF

The proposed distribution of the CSPF consists of several

agents that all process the CSPF. Assuming that there are

few interconnections between subsystems, we propose one

agent by subsystem in order to reduce the number of the

sent messages between the agents. The agents have to be

implemented into different computing platforms to increase

the computation of the CSPF. Thus, the architecture can be

represented as shown in figure 3.

Table 1. Example of results provided by the CSPF


7


7

Figure 3. Distributed architecture scheme.

The SPAs are the Subsystem Prognostic Agent. They

contain a database in which the functional and structural

knowledge are represented as well as the structural

interconnections between the subsystems. In this

architecture the monitoring layer sends the RULs of the

devices to the SPA that prognoses the subsystem to which

the device belongs.

In the proposed distribution of the CSPF, the knowledge is

distributed to the SPAs. The SPAs are generic agents

because they all process the same tasks but their results

depend on the knowledge modeled in their databases. The

prognostic of the complex system is made of the RULs of

the devices and the TBOs of the functions that are recorded

by the SPAs.

Thus the proposed architecture is also quite scalable.

Indeed, adding a device or a new function consists mainly in

adding functional and structural descriptions in the SPA of

the subsystem it belongs to and, perhaps, some arcs for the

structural models of the other SPAs. However, they do not

need to modify the algorithms processed by the SPAs.

Assuming the case study of the figure 2, three SPAs are

implemented.

The parts of knowledge that are modeled in the databases of

the SPAs are described in figure 4 (4.1 for SPA1, 4.2 for

SPA2 and 4.3 for SPA3).

From the structural knowledge, an SPA knows to what SPA

a TBO must be sent thanks to the identifier of the external

devices.

The communication between the SPAs can be modeled

thanks to an UML sequence diagram as shown in figure 5

where the monitoring layer is considered as a single agent

but it could made of several ones. Two SPAs are

represented: the one that receives the RUL and one that

represent the other SPAs. The task 1 is processed only once

when a “New_RUL” message is received by a SPA. The

“Modified_TBO” messages are emitted by the SPA from

task 2 or task 3. Those messages indicate to the SPA that

receives it what device is impacted by the TBO. Thus, when

a SPA receives such a message, it processes the task 2 and

the task 3. Even if it is distributed, the CSPF can be quite

long to execute and “New_RUL” or “Modified_TBO”

messages can be received while a SPA is running. So those

messages have to be stored in a kind of buffer. The t values,

which are fields of RULs and TBOs, can be used to sort the

messages by increasing date.

(4.1)

(4.2)

(4.3)

Figure 4. Modeled knowledge in SPAs .

Figure 5. Sequence diagram.


8


8

6. CONCLUSION

This paper presented a generic algorithm to carry out the

prognosis of complex system from the RULs of its devices.

This approach requires functional and structural knowledge

of the complex systems whose models were given.

Requirements for the functional modeling were detailed. As

the proposed prognostic principle requires graph traversal,

its distribution into generic agents in order to reduce its

computation time was presented. The distribution of the

functional and structural models into the prognostic agents

was proposed. The principle of prognosis provides online

pessimistic results but it can be run off-line for more

optimistic results. So one can consider that online process is

dedicated to control operators (TBOs of functions) and that

off-line process is dedicated to maintenance (TBOs of

functions and RULs of devices).

The distributed simulation platform is under development. It

uses a middleware to bring into operation the

communication between the monitoring layer agents and

SPAs. This platform shall enable to compare the centralized

approach (with one SPA) and the distributed approach with

several SPAs.

Another perspective will consist of the definition of

functional and structural model to assess TBOs of devices

and functions even when RULs of devices are increasing

(when t+T is increasing). Eventually, the problem of

uncertainty of RULs could be addressed for prognosing

complex systems.

REFERENCES

Byington, C., Roemer, M.J., Watson, M., Galie, T. (2003).

Prognostic enhancements to gas turbine diagnostic

systems, Proceedings of IEEE Aerospace Conference,

vol. 7, pp. 3247-3255.

Chittaro, L., Ranon, R. (2003). Hierarchical model-based

diagnosis based on structural abstraction, Artificial

Intelligence, vol. 155, pp. 147–182

Dragomir, O., Gouriveau, R., Zerhouni, N., Dragomir, F.

(2007). Framework for a distributed and hybrid

prognostic system, Proceedings of 4th IFAC

Conference on Management and Control of Production

and Logistics.

Dunjo, J., Fthenakis, V., Vilchez, J.A., Arnaldos, J. (2010).

Hazard and operability (HAZOP) analysis. A literature

review, Journal of Hazardous Materials, vol. 173, pp.

19–32.

Engel, S., Gilmartin, B., Bongort, K., Hess, A. (2000).

Prognostics, the real issues involved with predicting life

remaining, Proceedings of the IEEE Aerospace

Conference, vol. 6, pp. 457-469.

Iung, B., Monnin, M., Voisin, A., Cocheteux, P., Levrat, E.

(2008). Degradation state model-based prognosis for

proactively maintaining product performance, CIRP

Annals - Manufacturing Technology, vol. 57, pp.49–52.

Jardine, A., Lin, D. and Banjevic, D. (2006). A review on

machinery diagnostics and prognostics implementing

condition-based maintenance, Mechanical Systems and

Signal Processing, vol. 20, pp. 1483-1510.

Jennings, N.R., Wooldridge, M. (1995) Applying agent

technology, Applied Artificial Intelligence, vol. 9, pp.

357-369.

Lebold, M., Thurston, M. (2001) Open standards for

condition-based maintenance and prognostics systems,

Proceedings of the 5th annual maintenance and

reliability conference (MARCON 2001).

Muller, A., Crespo Marquez, A., Iung, B. (2008). On the

concept of e-maintenance: Review and current research,

Reliability Engineering and System Safety, vol. 93, pp.

1165–1187.

Reiter, R. (1992). A theory of diagnosis from RST

principles, Readings in model-based diagnosis, Morgan

Kaufmann Publishers, pp. 29-48.

Roemer, M., Byington, C., Kacprzynski, G.J.,

Vachtsevanos, G. (2007). An overview of selected

prognostic technologies with reference to an integrated

PHM architecture. Technical Report, Impakt

Technologies.

Saha, B., Saha, S., Goebel, K. (2009). A distributed

prognostic health management architecture,

Proceedings of the Conference of the Society for

Machinery Failure.

Scarf, P. (2007). A Framework for Condition Monitoring

and Condition Based Maintenance, Quality Technology

& Quantitative Management, vol 4, pp. 301-312.

Takai, S. Kumar, R. (2011). Inference-Based Decentralized

Prognosis in Discrete Event Systems, IEEE

Transactions on Automatic, vol. 56, pp.165-171.

Vachtsevanos, G., Wang, P. (2001). Fault prognosis using

dynamic wavelet neural networks. Proceedings of

AUTOTESTCON IEEE Systems Readiness

Technology Conference, pp. 857-870.

Vachtsevanos, G., Lewis, F. L., Roemer, M., Hess, A., Wu,

B. (2006). Intelligent fault diagnosis and prognosis for

engineering system. Hoboken, NJ: John Wiley & Sons,

Inc.

Voisin, A., Levrat, E., Cocheteux, P., Iung, B. (2010).

Generic prognosis model for proactive maintenance

decision support: application to pre-industrial e-

maintenance test bed, Journal of Intelligent

Manufacturing, vol. 21, pp. 177–193.

Worn, H., Langle, T., Albert, M., Kazi, A., Brighenti, A.,

Revuelta Seijo, S., Senior, C., Sanz-Bobi, M.A., Villar

Collado, J. (2004). Diamond: distributed multi-agent

architecture for monitoring and diagnosis. Production

Planning and Control, vol. 5, pp. 189-200.


9

An Approach to the Health Monitoring of a Pumping Unit in an

Aircraft Engine Fuel System

Benjamin Lamoureux1, 2

, Jean-Rémi Massé3, and Nazih Mechbal

4

1,3Snecma (Safran Group), Systems Division, Villaroche, France

[email protected]

[email protected]

2,4Arts et Métiers ParisTech, PIMM UMR CNRS, Paris, France

[email protected]

ABSTRACT

This paper provides an approach for health monitoring

through an early detection of failure modes premises. It is a

physics-based model approach that captures the knowledge

of the system and its degradations. The component under

study is the pumping unit such as those found in aircraft

engines fuel systems. First, a complete component analysis

is performed to determine its potential degradation and a

physics-based relevant component health indicator (CHI) is

defined. Then, degradations are modelled and their impacts

on the CHI are quantified using an AMESim®

physics-based

model. Assuming that in-flight measures are available, a

model updating is performed and a healthy distribution of

the CHI is computed. Eventually, a fault detection algorithm

is developed and statistical validation is performed through

the computation of key performance indicators (KPI). In

parallel, a degradation severity indicator (DSI) is defined

and prognostic is performed based on the monitoring of this

DSI.

1. INTRODUCTION

In the modern aircraft engines industry, increasing products

availability is of paramount importance. Undeniably, delays

and cancellations caused by unanticipated components

failures generate prohibitive expenses, especially when

failures occur on sites without proper maintenance staff and

equipments. In order to minimize the occurrence of these

unexpected costly flight failures and to extend system

availability, Prognostic Health Monitoring (PHM) system

has become a necessity by performing continuous diagnosis

and capturing the current health state of the component.

A PHM system (Sheppard, Kaufman and Wilmer 2009)

ideally performs fault detection, isolation, diagnostics

(determining the specific fault mode and its severity) and

prognostics (predicting accurately remaining useful life).

Whereas fault detection and diagnostics have been the

subject of considerable emphasis (Isermann 1997,

Basseville 1998, Balaban, et al. 2009), prognostics has

gradually emerged as a research topic that can push the

limits of health management systems, particularly in

aeronautics (Byington, et al. 2004, Orsagh, et al. 2005,

Massé, Lamoureux and Boulet 2011). With the recent

development of smart materials, PHM also relates to the

reliability of structures or Structural Health Monitoring

(SHM) (Mechbal, et al. 2006).

If significant research efforts have been done in the field of

PHM, there still remains a huge gap between academic

research and industrial expectations. On the one hand, the

industrial approach to health monitoring rarely integrates

physics-based models to simulate degradations and perform

validation of fault detection and diagnostics and on the other

hand, the academic approach rarely integrates all the

constraints due to on-board exploitation, such as sampling

frequency and storage limitations, imposed sensors number

and location, or limited computation capabilities. The main

purpose of this paper is to merge numerical model-based

and statistical data-driven approaches to perform fault

detection and prognostics on an actual system in its in-

service functioning environment.

It is evidence that a lot of information could be extracted

from the huge quantity of data recorded during a flight. One

of the aircraft engine subsystems, which in case of failure

may result in significant maintenance cost to an airliner, is

the fuel system. Nevertheless, despite its critical function,

the aircraft engine fuel system or its components are almost

never cited as potential candidates for health monitoring. In

response to this lack, we have conducted a complete study

on this subject. The aim of the present paper is to apply fault

_____________________

Benjamin Lamoureux et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License,




10


2

detection and prognostic to one of the main component of

the fuel system: the pumping unit. The other novelty of this

work is to use a numerical model to quantify the CHI’s

sensibility to degradations in order to create degraded data

from operating measures.

The remainder of the paper is organized in five sections

following the five main parts of the proposed development

method: health monitoring perimeter definition; data

analysis, system and degradation modeling; simulation

results and statistical validation. An additional part deals

with prognostics issue. We conclude and present future

works in a latter section.

2. HEALTH MONITORING PERIMETER DEFINITION

2.1. Aircraft Engine Fuel System Analysis

To perform health monitoring of the pumping unit, it is

essential to study the whole aircraft engine fuel system

because each component contributes to the pressurization of

the hydraulic circuit.

The system is composed of the following components, as

presented in Figure 1:

The bypass valve regulates the flow entering the fuel

metering valve

The fuel metering valve doses the flow to injectors

The pressurizing valve maintains a constant pressure

drop between and

The switch valve switches between two configurations

of an external system

In the figure above, is the supply pressure provided by

aircraft fuel tank, is the low pressure at the outlet port of

the centrifugal pump and is the high pressure at the

outlet of the gear pump. is the injection pressure

in the combustion chamber.

2.2. Degradation Modes of the Gear Pump

Thanks to expertise, experience feedback and Failure Mode

and Effects Analysis (FMEA), two main degradation modes

were selected.

Definition 1:

For a gear pump, an internal leakage is a leakage

between inlet and outlet of the pump. It's mainly due to

contamination of hydraulic fluid which results in abrasion

of gearings surfaces.

Definition 2:

For a gear pump, an External leakage is a leakage

to the exterior of the pump. It’s mainly due to vibrations and

aging of mechanical parts or joints.

2.3. Component Health Indicator

To monitor the state of the gear pump, a feature extracted

from measures and named Component Health Indicator

(CHI) is defined.

Definition 3:

The CHI is a physical measure (or function of it) that

allows, by monitoring its changes, to inquire about the

health of a component.

In the case of gear pump monitoring, the chosen CHI

corresponds to the rotation speed of the pump at the opening

of the switch valve, named . It corresponds to the

rotation speed for which hydraulic power is high enough to

open the valve. Thus, an increase of could indicate

that the pump is less efficient. The valve opening is

confirmed at fifty percent of the whole stroke. An example

of extraction is given in Figure 2. Centrifugal Pump

PA/C

Min flow

Switch Valve

Plp Php

Bypass Valve

Gear Pump

Fuel Metering Valve

Pinjection

Figure 1: Architecture of an aircraft engine fuel system

Pressurizing Valve


11


3

Figure 2 : extraction of

3. DATA ANALYSIS

In the case where measures are available and assuming that

at the time of their recording, the system was faultless, a

healthy distribution of can be computed. In this

example, the healthy reference distribution comes from

statistical analysis of about 400 start sequences of test

flights as shown in Figure 3.

Figure 3 : Results of CHI Extraction

Then, the histogram of the distribution is given in Figure 4

with its maximum likelihood associated function. Without

loss of efficiency, we assume that the distribution follows a

normal law where is the

mean of the healthy distribution and is its standard

deviation.

Figure 4 : Distribution of Healthy State

4. SYSTEM AND DEGRADATION MODELING

4.1. Gear Pump Behavior Modeling

Some related works on the subject have addressed the issue

of modeling a gear pump and their degradations. For

example, Casoli et al. (Casoli, Andrea and Franzoni 2005)

have proposed a method to model a gear pump with

AMESim® and Fritz and Scott have developed a wear model

(Frith and Scott 1994).

In the developed AMESim® model the pump outlet flow is

then expressed as in (1):

(1)

where is the pump outlet flow, the pump

displacement, the volumetric efficiency and the

pump rotation speed. The expression used to compute the

volumetric efficiency is given in (2), i.e.,

(2)

where is the pressure drop between pump inlet and pump

outlet, is the fluid temperature at the pump inlet and

are empirical constant values.

4.2. Fuel System Modeling

The whole aircraft engine fuel system is modeled from

classic AMESim®

blocks.

The model variables are given in Table 1.

Input Curve of rotation speed versus time

Parameters - Fuel Temperature

- Aircraft Supply pressure

Output

Table 1. Model Variables

4.3. Degradation Modeling

To simulate the influence of all the potential faults, we

introduce them into the AMESim® model. Internal leakage

is modeled by a diaphragm with a variable section between

pump inlet and pump outlet (Figure 5a) and external leakage

is modeled by a diaphragm between pump outlet and

external tank at atmospheric pressure (Figure 5b).

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

1000

2000

Time

Rota

tion S

peed

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.005

0.01

Switch Valve Position

High Pressure Compressor Rotation Speed

HPC Rotation Speed at opening of SwV

50% of the SwV stroke

100% of the SwV stroke

0 50 100 150 200 250 300 350 400 450 500200

300

400

500

600

700

800

900

1000

1100

1200

1300

Index of Flight

Ro

tatio

n S

pe

ed

At

the

Sw

V O

pe

nn

ing

(rp

m)

500 600 700 800 900 1000 11000

2

4

6

8

10

12

14

16

18

20

Rotation Speed at the Opening of SwV

pd

f

Histogram of the CHI

Maximul Likelihood Function

Time (s)


12


4

Figure 5 : AMESim® modeling of a) internal leakage and

b) external leakage

5. SIMULATION RESULTS

The purpose of this part is to determine the sensibility of

to degradations. The behavior of the system is

simulated for nominal and . The function is

approximated by a linear curve to simulate the behavior of

the pump during the start sequence.

5.1.1. Maximal Degradation Intensity

The degradation intensity is defined as the leakage flow

crossing the diaphragm (Figure 5) at 10% of the maximal

rotation speed.

The Maximal Degradation Intensity, named is the

intensity for which the system is in a non functional state. In

this case, the is reached when the pump is not able to

deliver the sufficient flow needed for the start sequence.

is calculated from specification of the minimal pump

outlet flow allowed at 10% of the maximal rotation speed.

The maximal intensity is different for each degradation so

both and

are computed.

5.1.2. Model Updating

The model updating is performed based on the comparison

between and simulated in the nominal

flawless state. To perform the updating, some parameters of

the model, such as the displacement of the pump or the

calibration of the Switch Valve sensor are adjusted.

5.1.3. CHI sensibility results

Degradations of increasing intensities up to are

simulated to quantify the sensibility of Results are

given in Table 2.

Type of

Degradation Intensity of Degradation

Value of

the CHI

Internal Leakage

0 750

Low: 826

Medium: 893

High: 1027

External Leakage

0 750

Low: 770

Medium: 834

High: 981

Table 2. Simulation Results

6. STATISTICAL VALIDATION

Statistical validation is based on the comparison between

the measured distribution of the healthy state and the

estimated distribution of the faulty states given by specific

transformation laws applied to the CHI.

6.1. CHI Transformation Laws

Definition 4:

CHI transformation laws ( ) are functions calculating

the variation of a CHI for a given degradation with a given

intensity. The typical form of a is given in Eq. 3.

(3)

For example, considering and degradation ,

transformation law gives the variation of

, named as a function of the degradation

intensity .

For each degradation, a is defined and can be apply

to a real distribution of CHI as following (4).

(4)

where is the estimated value of the CHI in presence

of and is its healthy value.

The two are computed by applying a linear

regression between degradation intensities and values.

The results are given in Eq. 5 and Eq. 6.

(5)

(6)

with and coefficient of linear regressions.

Figure 6 gives an exemple of how is calculated.


13


5

Figure 6 : Exemple of coefficient S computation

6.2. Application of the CHI Transformation Laws

Once and are computed, can be

applied to the real healthy distribution to construct a

degraded CHI defined by:

(7)

with the constructed degraded value of for

degradation, of intensity and the

measured value of the In the case of internal leakage

flaw, results are given in Figure 7.

Figure 7 : Distribution of CHIs

6.3. Key Performance Indicators

In aeronautics, because of the modular architecture, the

main part of maintenance operations is the troubleshooting,

which consists in finding the faulty Line Replaceable Unit

(LRU) to change it. It means that in the diagnostic process,

only isolation and not identification is of paramount

importance. As the two degradations considered in this

study affect the same unit, only fault detection is addressed.

A complete signal detection theory can be found in

(Wickens 2002).

Definition 5:

A Key Performance Indicator (KPI) is an indicator of the

monitoring system efficiency. Its required value is given by

specifications.

For example, in aeronautics, specifications usually require a

less than 5% false negatives and less than 20% false positive

(Table 3).

KPI Definition

False Positive

(FP) Rate

Proportion of False Positive (false alarms)

among the states for which a fault is

detected (see Figure 8)

False Negative

(FN) Rate

Proportion of False Negative (undetected

faults) among the states for which no fault

is detected (see Figure 8)

Table 3. Key Performance Indicators

6.4. Detection Threshold Selection

Thanks to , degraded data is computed from real

healthy data. Once degraded data is estimated, detection

threshold can be defined.

Typically, the chosen value for is calculated from Eq. 8

where is a positive real.

(8)

The graphical meaning of is given in Figure 8. In this

figure, false negative and positive are represented for

calculated with .

Figure 8 : definition of the coefficient

6.5. Statistical Validation

As explained previously, only fault detection is performed

in this study. For example, the KPI for medium and high

degradation levels of the internal leakage are presented in

Table 4 and associated ROC curves in Figure 9.

Internal Leakage Medium Internal Leakage High

FP FN FP FN

0 50% 2.4% 0 50% 0%

1 16.4% 16.4% 1 16.4% 0.2%

1.5 7.1% 31.5% 1.5 7.1% 0.7%

2 2.4% 50.6% 2 2.4% 2.6%

0 1 2 3 4 5 6 7 8 9700

750

800

850

900

950

1000

1050

1100

Degradation Intensity

CH

I

Evolution of the CHI versus Degradation Intensity

Linear Regression

Slope = S

500 600 700 800 900 1000 1100 1200 13000

2

4

6

8

10

12

14

16

CHI value

pd

f

500 600 700 800 900 1000 1100 1200 13000

1

2

3

4

5

6x 10

-3

CHI value

pd

f

Healthy

Low

Medium

High

0 20 40 60 80 100 120 140 160 180 2000

0.005

0.01

0.015

0.02

0.025

0.03

CHI value

pdf

Healthy distribution

Faulty distribution

A = 0

A = 2 A = 4

False Negative

False Positive


14


6

2.5 0.7% 69.6% 2.5 0.7% 7.4%

3 0.1% 84.3% 3 0.1% 17.2%

Table 4. Performance of fault detection

Figure 9 : ROC curves for Low, Medium and High

Intensities

In aeronautics, degradations are detectable if the KPI are

such as false negative rate is under 5% and false positive

rate under 20%.

In conclusion, internal leakages of medium intensity are not

detectable whereas those of high intensity are detectable

with equal to 1, 1.5 or 2. The results for the external

leakage will not be exposed here but they are very similar.

7. PROGNOSTICS

The purpose of the prognostics is to prevent the CHI from

reaching the value.

7.1. Degradation Severity Indicator

A Degradation Severity Indicator (DSI) is an index defined

in order to quantify potential impacts of the degradation on

the system operability. The higher the DSI, the more

degraded the system.

In this paper, DSI for a start sequence N is defined as

following:

(9)

In Eq. 9, refers to the maximal degradation

intensity reachable before complete failure of the system.

7.2. Prognostics

The prognostics method defined in this paper is based on the

trending analysis. The purpose is to estimate the Remaining

Useful Life (RUL) of the pump by anticipating the moment

when will be greater than 1. The RUL is expressed in

flights number. Aeronautics specifications usually require

that degradations should be detected at least 20 flights

before the occurring of the failure.

The method consists in calculating a linear regression on the

past flights at each new flight and to predict the RUL

assuming that the slope remains the same. The value of

the RUL is the estimated number of flights before crossing

of the maximum DSI threshold (Figure 10). When the value

of the RUL is estimated inferior to 20 flights, an alarm is

sent to maintenance operators.

For the time being, there are no defined KPI for prognostics

in aeronautics.

Figure 10: Remaining Useful Life Computation

To validate the method, some gradually degrading measures

over 1500 flights are constructed thanks to .

Depending on the index of the flight , the value of is

given by Eq. 10.

(8)

Where designates a random selection of

a value in the healthy distribution and is the

function giving the intensity of the degradation versus the

flight index. The function is normally derived from

physics of failure but in this application, we suppose that it

is a linear degradation growing from 0 to a maximal

degradation intensity at flight 1000 (Figure 10).

In Figure 11, results of the alarm computation for the RUL

are given and it can be noticed that some alarms occur

around a hundred flights before reaching . It shows

that the method is efficient to anticipate failures by

monitoring the DSI. However, there is still work to be done

to limit false alarms.

Figure 11: Remaining Useful Life Alarms

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Positive Rate

Tru

e P

ositiv

e R

ate

Low

Medium

High

0 500 1000 1500-0.5

0

0.5

1

1.5

2

Flight Index

De

gra

da

tio

n S

eve

rity

In

dic

ato

r (D

SI)

RUL

200 300 400 500 600 700 800 900 1000 1100 12000

100

200

300

400

500

600

700

800

900

1000

Flight Index

Re

ma

inin

g U

se

ful L

ife

No Alarm

Alarm


15


7

CONCLUSION

In conclusion, a method for the health monitoring of a

pumping unit from the definition of physics-based

indicators to the statistical validation of the Key

Performance Indicators has been proposed. The main

novelty of this paper is that after having defined physics-

based Component Health Indicator, their sensibility is tested

on a physic-based model constructed in the AMESim®

environment.

The statistical validation showed that the Component Health

Indicator defined was relevant to detect both internal and

external leakages of a gear pump. A prognostics method

was also addressed based on the Remaining Useful Life

computation and proved to be efficient to anticipate

maximum degradation severity threshold crossing.

For future prospects, the objective is to work on the

improvement of fault detection and prognostics algorithms

and to extend the PHM system to the whole aircraft engine

fuel system. Besides, some KPI must be defined for

prognostics.

NOMENCLATURE

CHI for the gear pump

Outlet flow of the gear pump

Volumetric efficiency of the gear pump

Dummy variables

Temperature at the pump inlet

Rotation Speed of the Pump

Fuel Temperature

Aircraft Supply Pressure

Mean of Healthy Distribution

Standard Deviation of Healthy Distribution

Maximal Intensity of Internal Leakage

Maximal Intensity of External Leakage

REFERENCES

Balaban, E., A. Saxena, P. Bansal, K.F. Goebel, P.

Stoelting, and S. Curran. "A diagnostic approach for electro-

mechanical actuators in aerospace systems." IEEE

Aerospace Conference Proceedings. Big Sky, 2009.

Basseville, Michelle. "On-board component fault detection

and isolation using the statistical local approach."

Automatica vol. 34, 1998: 1391-1415.

Byington, C.S., M. Watson, D. Edwards, and P. Stoelting.

"A model-based approach to prognostics and health

management for flight control actuators." IEEE Aerospace

Conference Proceedings. 2004. 3551-3562.

Casoli, Paolo, Vacca Andrea, and Germano Franzoni. "A

Numerical Model for the Simulation of External Gear

Pumps." Proceedings of the 6th JFPS International

Symposium on Fluid Power. Tsukuba, 2005.

Frith, R.H., and W. Scott. "Wear in external gear pumps : a

simplified model." Wear 172, 1994: 121-126.

Isermann, Rolf. "Supervision, fault-detection and fault-

diagnosis methods - An introduction." Control Engineering

Practice vol.5, 1997: 639-652.

Lamoureux, Benjamin, Jean-Rémi Massé, and Nazih

Mechbal. "A Diagnosis Methodology for the

Hydromechanical Actuation Loops in Aircraft Engines."

Proceedings of the 20th Mediterranean Conference on

Control and Automation (forthcoming). Barcelona, 2012.

Lamoureux, Benjamin, Jean-Rémi Massé, and Nazih

Mechbal. "An approach to the Health Monitoring of the

Fuel System of a Turbofan." Proceedings of IEEE PHM

2012 (forthcoming). Denver, 2012.

Massé, J.R., B. Lamoureux, and X. Boulet. "Prognosis and

Health Management in system design." Proceedings of

IEEE PHM 2011. Denver, 2011.

Mechbal, N., M. Vergé, G. Coffignal, and M. Ganapathi.

"Application of a combined active control and fault

detection scheme to an active composite flexible structure."

Mechatronics vol. 16, 2006: 193-208.

Orsagh, R., D. Brown, M. Roemer, T. Dabney, and A. Hess.

"Prognostic health management for avionics system power

supplies." IEEE Aerospace Conference Proceedings. 2005.

Sheppard, J.W., M.A. Kaufman, and T.J. Wilmer. "IEEE

Standards for Prognostics and Health Management."

Aerospace and Electronic Systems Magazine 24, no. 9

(2009): 34–41.

Wickens, Thomas D. Elementary Signal Detection Theory.

Oxford University Press, 2002.


16

Application of Microwave Sensing to Blade Health Monitoring David Kwapisz1, Michaël Hafner2, Ravi Rajamani3

1,2Meggitt Sensing Systems, Fribourg, Switzerland [email protected] [email protected]

3Meggitt-USA, Inc.

[email protected]

ABSTRACT

This paper discusses the application of microwave sensing to turbine airfoil health monitoring. The proposed microwave system operates at 6- and 24-GHz and is applicable to both blade tip-clearance and blade tip-timing measurements. One of the main advantages of microwave systems, compared to other technology such as capacitive or eddy current, is that it can be installed for long term operations in the harsh environment of the first turbine stages. The monitoring of blade tip-timing and tip-clearance pattern is useful for detecting abnormal blade behavior due to structural damage. Such a sensing system can also be used in actively maintaining optimal blade-to-casing clearance, thereby enhancing turbine efficiency. This paper presents blade tip-clearance pattern monitoring based on microwave measurements. First, a laboratory study shows the ability of the system to consistently measure tip clearance pattern. Then tip clearance pattern measurements from a real engine test are presented. While this paper presents results from system testing on tip clearance, it is expected that this study will be carried forward in the next phase to demonstrate tip-timing measurement and further, to show how such as system can form the basis for a more comprehensive health management system.

1. INTRODUCTION

Both aero gas turbines as well as stationary gas turbines are increasingly deploying blade health monitoring (BHM) systems that assess the health of airfoils by sensing the tip-clearance and the tip-timing of individual blades. BHM systems can estimate different parameters depending on the sophistication of the algorithms. Many academic papers have been written on this topic, but the real value of such technologies in solving the BHM problem is seen by the number of real world systems that have started employing the technology (Flotow, Mercadal & Tappert, 2000; David Kwapisz et.al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Zielinski & Ziller, 2005; Hess, Frith & Suarez, 2006; Hess, 2007; Martin, Forry, Maier & Hansen, 2011).

Blades can fail because of the structural flaws that are caused by manufacturing defects or by impacts from external objects. Tip-clearance and tip-timing measurements can be used to detect these flaws. One relatively straightforward way of doing this is to establish a “baseline” pattern during the initial operation of the turbine and then assessing the “deviation” from this baseline. Because these patterns will change as the turbine loads, the algorithms will need compensation factors built into them, but these can be readily developed. For example, the effect of temperature can be accounted for by a simple additive term, as is shown in the paper. More sophisticated techniques involve the actual modeling of the vibration modes of the airfoil using physics-based techniques and then detecting deviations from expected behavior and basing the diagnostics on this. The former method is easier to implement but is not as powerful as the model-based method.

In either case, the key is to get a reliable and repeatable measurement system that can be depended on to deliver consistent measurement under noisy and harsh conditions.

Additionally, monitoring blade passage can be used to detect incipient damage to the rotor as well as aid in sophisticated clearance control. An SAE Aerospace Information Report, currently in preparation (2012), will provide a good overview of various uses of BHM. A specific sensor can only measure the instantaneous clearance between the airfoils and a specific location on the casing. With multiple sensors located around the circumference, a better estimate of the clearance can be obtained, which can be used for real-time clearance control. Structural failure, especially in the low pressure compressor, can occur due to foreign object damage (FOD). The BHM system can be used to detect FOD as well, possibly in concert with other diagnostic sensors such as accelerometers mounted close to the front of the turbine. Mounting two sensors in roughly the same radial location can help in detecting axial deflections that will allow blade twist to be estimated, again improving the ability to measure different failure mechanisms. Of course, these techniques


17

European Conference of the Prognostics and Health Management Society, 2012

2

come at the price of added system complexity and cost, so it has to be weighed carefully against the benefit.

This paper describes a system based on microwave technology that delivers a highly accurate and consistent measurement. This is demonstrated for blade clearance in this paper via experimental results. In particular, the ability of the sensor to detect blade length variations lower than a few tenths of millimeters is described. The detection of such variation can be used to detect blade fatigue cracks and thus, improve the maintenance scheduling. Combined with its harsh environment survivability, this detection capability offers a real opportunity to design reliable BHM systems (Woike, Abdul-Aziz & Bencic, 2010).

Section 2 gives a general description of the microwave sensor including its operating principle and its application to blade anomaly detection. BHM performance depends critically on the accuracy and consistency of the measurement system. This is described in Section 3. Finally, Section 4 describes experimental results from a test on an industrial gas turbine. This shows that the microwave sensor is capable of accurate and consistent measurement of tip–clearance pattern.

2. PRESENTATION OF THE MICROWAVE SENSOR

2.1. Microwave measurement principle

The microwave blade monitoring system presented here is based on a phase measurement principle. A continuous-wave microwave signal is generated in a microwave signal conditioning unit and transmitted through a coaxial cable to the probe (Figure 1). The probe is an antenna capable of transmitting the continuous-wave into the space between the casing and the bladed rotor. The probe also acts as a receiver and captures the emitted wave that is reflected back by the blade tip, which is measured by the electronics.

Figure 1. System architecture.

The microwave electronics generates a continuous-wave which is transmitted to the probe through a circulator. Then, the wave is reflected by the blade tips back to the circulator and to two RF mixers arranged in vector architecture. The in-phase and quadrature components of the reflected wave are extracted and digitalized before processing and phase calculation.

A vector mixer architecture is used to compare the received signal to an internal reference and to reconstruct a phase measurement. Basically, the phase between the transmitted and the reflected signals is proportional to the distance between the probe and the blade tips. The conversion between the measured phase φ and the associated clearance δ is given by Eq. (1) and depends on the wavelength λ of the microwave signal. Compared to other technologies, this relationship is linear and much easier to calibrate via sensitivity and offset corrections.

λπ

ϕδ ⋅=4

(1)

2.2. Probe and engine installation

The microwave probe is basically an antenna optimized to transmit at a defined frequency with a given bandwidth. This antenna is packaged into a hermetic sealed body with an integral mineral cable on its back. This probe construction is made with materials chosen for their high temperature survivability coupled with reliable long term operation in the harsh environment of gas turbines.

Two versions of the microwave system have been developed. The first one uses a frequency in the 6 GHz band and has a measurement range of 25 mm and a probe diameter of 14 mm (Figure 2). It is suited for large frame gas turbines. The second one uses a frequency in the 24GHz band for a measurement range of 6 mm and a probe diameter of 8.5 mm. It is preferably used with small blades from aviation or aero-derivative gas turbines.

Figure 2. Picture of the 6 GHz microwave probe.

The probe installation requires an opening through the casing such that the probe tip has a direct view of the rotor and its blade tips. A ceramic window on the probe tip allows the microwave signal to transmit to the blade. A retaining ring ensures that this ceramic window does not fall into the gas path. Depending on the engine construction, the integration is more or less complex. Normally, the

I

Q

90° Phase

Shifter

Blade Pass

PLL

Probe_

_Casing

Clearance

Cable_

ARG φ

REF


18


3

installation in the turbine section has more constraints due to the high temperatures and to ensure proper sealing between several casing layers, which can move relatively, one to the other. Figure 3 shows an example of probe mounting in the turbine section of an aero engine with two casing layers. The probe tip is usually installed flush or recessed from the casing inner surface to ensure no contact with the blades even during a rub event.

Figure 3. Probe mounting on the turbine.

The important parameter for the probe installation is the position of the probe relative to the blade tips as the probe measures what is directly underneath it. It is not always possible to install the probe at a desired location due to piping or mechanical constraints. Typically, the rotor moves axially due to thermal dilatation and aerodynamic forces and therefore the blade tips move in the axial direction with respect to the fix casing. This is normally known and taken into account during probe mounting.

2.3. Microwave tip clearance measurement output

The microwave blade tip clearance system presented in this paper does not provide continuous blade profile waveform output proportional to the measured distance like a laser, eddy current or capacitance probes. The amount of data becomes quickly important in the case of continuous blade profile measurement and has to be reduced to be exploitable for blade health monitoring. Therefore, data reduction is used by the mean of an algorithm detecting each individual blade within the microwave measurement and extracting only one tip clearance value for each blade. These calculated tip clearance values correspond to the minimum distance between the individual blade tips and the probe. The sensor then provides a digital array with one tip clearance measurement 𝛿𝑖 per blade. The array of tip clearance measurements over a full rotor revolution is called the blade clearance pattern.

2.4. The centered blade clearance pattern and its application to health monitoring

The monitoring of the individual blade tip clearances 𝛿𝑖 provides useful information on abnormal blade elongation due to cracks and thus, can be used for health monitoring purposes. Nevertheless, abnormal blade elongation must be differentiated from normal elongations due to temperature or centrifugal forces Eq. (2).

𝛿𝑖 = 𝛿𝑖𝑐 + δtemperature + δcentrifugal (2)

The main hypothesis that can be done on abnormal elongation is that it affects only one particular blade while global elongations affect all the blades of the rotor. Given a relative high number of blades, the mean clearance should not be affected by the abnormal elongation of one particular blade. In this case, the centered blade clearance pattern 𝛿1𝑐 , … , 𝛿𝑖𝑐 , … , 𝛿𝑁𝑐 defined by Eq. (3) can be used as baseline. Any deviation from this baseline can be used as metric for blade crack detection (Figure 4).

𝛿𝑖𝑐 = 𝛿𝑖 −1𝑁∑ 𝛿𝑗𝑁𝑗=1 (3)

Figure 4. Principle of crack detection based on clearance pattern monitoring. If the centered clearance of one

particular blade passes above a given threshold, an alarm can be generated.

Such a detection strategy requires that an abnormal blade elongation can be detected without any ambiguity due to measurement errors. Therefore, it is necessary to validate that the clearance pattern can be consistently measured by the sensor system. This point is discussed in the next section.

3. LABORATORY EVALUATION OF CLEARANCE PATTERN MEASUREMENT

3.1. Problem Description

Blade elongations due to mechanical cracks are about a few tenths of a millimeter (Dyke, 2011). In order to be able to detect these abnormal elongations, the measurement uncertainty on tip-clearance has to be lower than the elongation itself. This elongation has to be consistently differentiable over the different engine conditions. The purpose of this laboratory evaluation is to validate that the microwave system can detect blade elongations of a few tenths of a millimeter. For that, two types of test campaign


19


4

are realized. The first one is done on a precision test setup which can accurately position the blades relative to the probe. For this test, five blade mockups are mounted and the clearance pattern is characterized at different nominal clearance. The goal is to validate the consistency of the tip-clearance pattern measurements over a given range of nominal clearance. The second type of validation is realized on a spinning setup with forty blade mockups mounted on a rotor. The goal is to characterize precision and accuracy of clearance pattern measurement with a test bench representative of an engine. Both test campaigns use the 24GHz version of the microwave system and a laser sensor for reference measurements.

3.2. Reference measurements with a laser sensor

In order to correctly assess the clearance pattern measurement realized with the microwave system, a laser sensor is used for reference measurements. The five blade-mockups are mounted on the precision test setup – as described in Kwapisz, Hafner, and Queloz (2010) – and then scanned by a laser sensor (Figure 5).

The blade clearance profile measured by the laser sensor is given in Figure 6.

Figure 6. Actual tip clearance profile and associated pattern

measured with the laser sensor. Longest blade is about 250µm ahead the other ones.

To be compared with microwave measurements, the clearance pattern is directly extracted from the profile by

taking the median value of each blade tip profile. The individual blade clearances are within 60µm except for a longest blade which is 250µm ahead.

3.3. Measurement with the microwave sensor

The microwave sensor is installed on the precision test setup with the probe oriented toward the blade tip (Figure 7). The nominal clearance between the probe tip and the blade tips is set to 1mm and the blade tip clearance pattern measured by the microwave system.

Figure 7. The 24GHz probe installed in front of the blades.

In order to compare the measurement made by the microwave system with the reference measurement made with the laser system, both measurements are made without any dismounting. Therefore, it is possible to make a direct comparison between both systems.

This first measurement shows that the microwave system correctly measured the clearance pattern with small errors of about 50μm maximum (Figure 8). The longest blade is clearly differentiable. This result has been obtained for a nominal clearance of 1mm and has to be confirmed for the entire clearance range of variation, purpose of the next section.

Figure 8. Direct comparison between laser measurement

and microwave measurement.

3.4. Consistency of pattern measurement over the clearance range of variation

Blade tip clearance measurement is difficult for the different technologies in competition: capacitive, inductive or

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

-0.2

-0.1

0

0.1

0.2

Time

Cle

aran

ce p

atte

rn (

mm

)

Blade ProfileClearance Pattern

0 1 2 3 4 5 6

-0.2

-0.1

0

0.1

0.2

Time

Cle

aran

ce p

atte

rn (

mm

)

LaserMicrowave

Figure 5. The exact blade profile is measured by a laser sensor with an accuracy of 6µm.


20


5

microwave sensors. The main reason is that the sensor behavior greatly depends on the nominal sensing distance because of the non-linearity of the physical laws that are involved. In the case of microwave sensing, clearance measurement is based on phase measurement with a linear relationship between phase and clearance (Eq. 1). Therefore, the calibration of such sensor is relatively easy and consists only on sensitivity and offset correction. Nevertheless, the beam width is relatively large and spatial filtering effects can generates measurement errors (Holst, 2005). This is why the correctness of clearance pattern measurement has to be validated over the full clearance range. This validation is the purpose of this section.

Regarding the blade geometry, which corresponds to aero-derivative turbine, the clearance does not likely exceed 3mm. For safety purposes, the minimum clearance that can be set on the test bench is 1mm. Therefore, a set of clearance pattern measurement is performed with a nominal clearance that varies from 1mm to 3mm by step of 0.05mm. Figure 9 shows the clearance response of the five individual blades. The longest one consistently gives a shorter clearance over the full clearance range. The longest blade gives a consistent offset of about 250µm over the full nominal range. This result shows that measurement uncertainties are small enough to enable the differentiation of blade elongation of a few tenths of a millimeter.

Figure 9. The clearances measured by the microwave system with respect to the nominal one.

Typically, a strategy of blade health monitoring, based on blade elongation measurement, requires that the measurement uncertainties be lower than the elongation to detect. To characterize the measurement uncertainties, the blade clearance pattern is computed from each set of measurements between 1mm and 3mm and then compared to each other. Thereby, the uncertainty range on pattern measurement can be estimated as described by Figure 10. It shows that the measurement variability (between the first and the last deciles) on pattern measurement is about

±40μm. The first and last deciles are represented by the boxes. The median is represented by the line inside the box. The minimal and maximal values are represented by the lines. This results has been obtained without any averaging and thus, takes into account both precision and accuracy aspects. They are also consistent with the reference measurements and enable the differentiation of the longest blade over the entire clearance range.

Figure 10. Variability range of blade clearance pattern

measured between 1 and 3mm.

3.5. Measurement on a spinning setup

The validation of clearance pattern measurement is difficult to realize on a real engine in operation because of the lack of reference sensors that can survive the associated harsh conditions. In this case, the obtained measurements are difficult to interpret in term of accuracy and precision because the actual clearance pattern variations due to vibration and thermal expansion are unknown. That is why; a set of laboratory measurements has been realized on a spinning test bench with a reference laser sensor (Figure 11). It enables the direct comparison of microwave measurement with a reference sensor and helps the characterization of measurement performance.

The spinning test bench is based on a 500mm diameter rotor. Forty blades are mounted on the rotor and the tip-clearance of each individual blade can be tuned by using a dedicated sliding mechanical fixture. In order to evaluate the performance of the microwave sensor, the blades are set in order to get a rich clearance pattern. This pattern has been measured by a laser sensor for further comparison with the microwave system (Figure 12). This measurement was very stable with a standard-deviation lower than 10μm which indicates the absence of undesirable vibrations.

1 1.5 2 2.5 30.5

1

1.5

2

2.5

3

Nominal clearance (mm)

Mea

sure

d cl

eara

nce

(mm

)

Blade n°1Blade n°2Blade n°3Blade n°4Blade n°5

0 1 2 3 4 5 6

-0.2

-0.1

0

0.1

0.2

Blade indices

Cle

aran

ce p

atte

rn (m

m)

MicrowaveLaser


21


6

Figure 11. Joint measurement of the clearance pattern with the microwave and laser sensors on a spinning test bench.

The blade tip-clearance pattern has been measured by the microwave system over five hundred revolutions. Figure 12 shows the obtained results in term of median, extrema, first and last deciles. Basically, the clearance pattern measured by the microwave system accurately fits with the reference laser measurements. The precision of the microwave system obtained during this test corresponds to an uncertainty range of ±50μm (between the first and the last deciles). It is consistent with the precision of ±40μm obtained with the precision test setup (Section 3.4). This precision can be greatly improved by filtering the output of the sensor and by taking into account the tradeoff between precision and measurement bandwidth.

Figure 12: Measurement of the blade clearance pattern by

the microwave system.

The correlation graph of Figure 13 is obtained by plotting the blade pattern measured by the laser sensor versus the averaged pattern found by the microwave system. The obtained correlation coefficient is higher than 0.99 which validates the linearity of the microwave measurement. The residual deviation is about 17μm. This result demonstrates that microwave measurement can be used to reliably detect clearance variation lower than 100μm.

Figure 13: Correlation graph between microwave measurement and laser measurement. Each data point

corresponds to one particular blade.

3.6. Conclusion on laboratory study

The ability of the microwave sensor to adequately measure tip-clearance patterns has been evaluated through different laboratory tests. The first one, realized on a precision test setup, has showed that the tip-clearance pattern can be consistently measured over the clearance range 1-3mm with a measurement variability of ±40μm. A second test has been realized with a spinning test bench and the obtained results are consistent with a measurement variability of ±50μm. This measurement uncertainty mainly comes from electronics noise and depends on system configuration. For example, it could be improved by using higher performance cables or by applying filtering strategies to the sensor output. On the other hand, the accuracy on pattern measurement is very good with residual errors about 17μm on the measurement realized with the spinning test bench.

4. BLADE PATTERN MEASUREMENT REALIZED ON A REAL ENGINE

4.1. Presentation of data

The microwave sensor has been evaluated through an engine test performed in 2011 on a 25MW turbine (Kwapisz, Hafner, Spitsyn, Mykhaylov & Berezhnoy, 2011). The purpose of this test was the validation of tip clearance measurement but an additional objective was to evaluate the ability of the system to measure the tip-clearance pattern. For that purpose, ten blades of the rotor had been shortened by a few tenths of a millimeter. The measurement of the blade clearance pattern has been performed during different engine operating states. This section presents an analysis of measurement variability related to the clearance pattern measurement with real engine data.

0 5 10 15 20 25 30 35 40-0.5

0

0.5

1

Blade indices

Cle

aran

ce p

atte

rn (m

m)

MicrowaveLaser

-0.5 0 0.5 1-0.5

0

0.5

1

Laser Measurement (mm)

Mic

row

ave

Mea

sure

men

t (m

m)


22


7

4.2. Raw measurement analysis

In order to compare the measurement performance obtained during the engine test with laboratory measurement, the data are analyzed without any filtering (Figure 14). Basically, the ten shorter blades can be easily differentiated during the different engine operating states. In term of precision and variability, the measurements are consistent with the laboratory study and show a variability of ±60μm. Nevertheless, the precision can be improved by filtering the clearance measurement outputs as described in the next section.

Figure 14: Variability range of blade clearance pattern computed over the whole engine test represented by

extrema, first and last deciles and median.

4.3. Filtered measurement analysis

Detection of abnormal blade elongation requires a clearance pattern measurement with uncertainties lower than the elongation to detect. The measurement noise can be reduced by adequate filtering but there is a tradeoff between noise reduction and detection bandwidth. During the engine test, the measurement rate was 0.5 Hz. In order to reduce noise, a median filter with a window size of 20 samples is applied to the raw clearance measurements. In this configuration, the filtering leads to a minimal detection delay of 40s. Nevertheless, during this engine test, the measurement rate was not optimized and can be greatly improved for blade health monitoring applications.

The blade pattern has been computed for the whole engine test after having filtered the clearance outputs. The variation ranges are computed and shown by Figure 15. It is interesting to note that the variability of the blade tip-clearance pattern was not uniformly improved by the filtering. Some blades present a very small variability, lower than ±20μm, while other blades still have a variability of ±60μm. Indeed, the pattern variation comprises both measurement uncertainties and actual blade length variations. During this particular engine test, rotor speed and output power were not constant and the blades were subject to different temperature and load constraints. Because of small structural or mounting differences of the blades, the blade clearance pattern is not necessarily constant over all engine conditions. Typically, the BHM

system has to detect abnormal variation, due to crack, among these normal variations.

Figure 15: Variability range of blade clearance pattern computed over the whole engine test represented by

extrema, first and last deciles and median.

4.4. Conclusion on engine test data

The purpose of the engine test was not the validation of blade crack detection based on microwave sensing. The primary goal was the validation of the absolute clearance measurement. Nevertheless, a side results was the measurement of the blade clearance pattern over different engine conditions. Basically, it shows that the variation of the blade clearance pattern is about ±60μm without any filtering. This range comprises both measurement uncertainties and actual blade elongation. It has been improved by filtering and the obtain results show residual variability which are likely due to actual clearance variations. Basically, in order to assess the feasibility of blade health monitoring based on microwave measurement, three metrics are important. The first one is the blade elongation threshold that indicates a crack long enough that turbine control action needs to be taken. The second one is the normal blade elongation discrepancy from which a crack effect has to be differentiated. The last one is the measurement performance of the sensing system that monitors the blades. This last metric has been evaluated in this paper but the feasibility analysis of such BHM system requires additional knowledge on blade elongation.

5. CONCLUSION

Blade health monitoring offers real opportunities to improve gas turbine operation and to reduce maintenance costs. Different strategies and system architectures can be envisaged but one of the keys points is to obtain reliable and accurate sensor package. Due to the harsh environment in the hot section, only a few sensing technologies are capable of blade monitoring in this area. In this domain, the microwave sensor has real advantages as it is capable of accurate temperature measurements while withstanding temperatures near the turbine inlet. This paper has described the tracking of the blade clearance pattern as one way of using this technology for blade health monitoring. This

0 10 20 30 40 50 60 70 80

0

0.5

1

Blade indices

Cle

aran

ce p

atte

rn (m

m)

0 10 20 30 40 50 60 70 80

0

0.5

1

Blade indices

Cle

aran

ce p

atte

rn (m

m)


23


8

paper shows how to deal with variability that comes from measurement errors but also from real blade elongation discrepancies. This last point is very important and leads to physics-based diagnostic techniques. In addition to blade clearance measurement, the microwave system is capable of time-of-arrival measurements. This type of measurement is currently under evaluation and will certainly provide rich information for blades health monitoring. In conclusion, the microwave sensor provides a sound basis for future diagnostic systems in term of measurement performance and sensor operability.

REFERENCES

Dyke, J. (2011). Modeling behaviour of damaged turbine blades for engine health diagnostics and prognostics. Master thesis, University of Ottawa, Ottawa, Canada.

Flotow, A., Mercadal, M., & Tappert, P. (2000). Health monitoring and prognostics of blades and disks with blade tip sensors. Aerospace Conference Proceedings, IEEE, Mar 18-25, Big Sky, MT, USA.

Hess, A., Frith, P., & Suarez E. (2006). Challenges, issues, and lessons learned implementing prognostics for propulsion systems. Proceedings of ASME Turbo Expo 2006, May 8-11, Barcelona, Spain.

Hess, A. (2007). Prognostics and health management: The cornerstone of autonomic logistics. (Downloaded from http://www.acq.osd.mil/log/mpp/senior_steering/condition/Hess%20PHM%20Brief.ppt)

Holst, T. A. (2005). Analysis of spatial filtering in phase-based microwave measurements of turbine blade tips” Master’s thesis, Georgia Institute of Technology, Atlanta, GA, USA.

Kwapisz, D., Hafner, M., & Queloz, S. (2010). Calibration and characterization of a CW radar for blade tip clearance measurement. Proceedings of the 7th Euro-pean Radar Conference, September 30 - October 1, Paris, France.

Kwapisz, D., Hafner, M., Spitsyn, V., Mykhaylov, A., Berezhnoy, V. (2011). Test and validation of a microwave tip clearance sensor on a 25MW gas turbine engine. Proceedings of the XVI International Congress of Propulsion Engineering, September 14-19, Rybache, Ukraine.

Martin R., Forry, D., Maier, S., & Hansen, C. (2011). GE’s Next 7FA Gas Turbine “Test and Validation” (Downloaded from http://www.ge-energy.com/content/ multimedia/_files/downloads/GEA18457A_7FA_GI_7-27-11_r1.pdf)

SAE (2012). Airfoil diagnostics with blade tip sensors for operating turbomachinery, SAE Aerospace Information Report, AIR5136, Sep 2012.

Woike, M. R., Abdul-Aziz, A., Bencic, T. J. (2010). A microwave blade tip clearance sensor for propulsion health monitoring, AIAA-2010-3308. (Downloaded from http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa. gov/20100025863_2010028113.pdf)

Zielinski, M., Ziller, G. (2005). Noncontact blade vibration measurement system for aero engine application; International Symposium of Air Breathing Engines, September 4-9, Munich, Germany.

BIOGRAPHIES

David Kwapisz is research engineer at Meggitt Sensing Systems since 2008. He is responsible for technology and testing aspect of microwave sensing. He received a M.Sc. degree in 2005 from Ecole Supérieure des Techniques Aéronautiques et de Construction Automobile (Paris) and a Ph.D degree in Automatic Control in 2008 from Université de Limoges.

Michaël Hafner is product manager at Meggitt Sensing Systems since 2010. He is in charge of the microwave tip clearance and tip timing products for the Energy market. He received a Mechatronics M.Sc. degree in 2006 from the Swiss Federal Institute of Technology.

Ravi Rajamani joined Meggitt PLC in 2011 as an Engineering Director, responsible, in part, for Integrated Vehicle Health Management (IVHM) strategy. Ravi has a BTech from IIT Delhi, an MS from IISc, Bangalore, and a PhD (EE) from the University of Minnesota. Before his current position, Ravi worked at General Electric and at United Technologies primarily in the area of gas turbine controls and diagnostics. He is active within SAE’s Engine Health Management (E-32) and Integrated Vehicle Health Management (HM-1) committees.


24

Assessment of Remaining Useful Life of Power Plant Steam Generators – a Standardized Industrial Application

Ulrich Kunze1 and Stefan Raab2

1,2Siemens AG – Energy Sector, Erlangen, 91050, Germany [email protected] [email protected]

ABSTRACT

The Web based condition monitoring and diagnostic system “Boiler Fatigue Monitoring” enables on-line assessment of cumulative boiler creep and low cycle fatigue according to the European standard EN 12952-3/4 issued in 2001.

The application is employed as autonomous module as well as fully integrated into the Siemens process control system SPPA-T3000 and is becoming more and more a standard part of the power plant instrumentation and control (I&C).

The Fatigue Monitoring System (FMS) is a standard industrial application for both, new built power plants and retrofits of existing units of any kind. The system is not limited to Siemens I&C systems, it is possible to integrate FMS also into power plants with I&C systems of other suppliers. FMS is also capable for calculating the remaining lifetime for boilers designed according to the American standard ASME VIII-2.

1. INTRODUCTION

Steam generator lifetime monitoring and assessment of the remaining useful life (RUL) is a standard application in power plants. Since market requirements changed towards increased flexibility, power plants are operated more cyclic compared to the past. This makes fatigue monitoring systems even more important to immediately get known to impacts of start-ups or shut-downs onto the remaining lifetime of the boiler components.

Therefore, today fatigue monitoring systems for boilers are becoming more and more standard installations in new power plants and are often subject of upgrading activities.

The basis for the assessment of the RUL is the European standard EN 12952-3/4 issued in 2001, which contains simplified rules to calculate creep and low cycle fatigue.

These simplified rules are conservative but have the advantage to be easy to use.

Only 3 years after the release of EN 12952-3/4 the first boiler fatigue monitoring system (FMS) was installed for continuous operation in a new combined cycle power plant in Germany.

FMS is a module of web4diagnostics, the Web-based diagnostics system for power plants. FMS is used in both power plants as an on-line diagnostics system – it has since been installed around the world in more than 50 power plant units, in part with integration in the power plant's office network – as well as in the Siemens Intranet as a Web-based data archive and as tool for data analysis and evaluation.

Previously, this system was a supplement to the operational instrumentation and control (I&C), where the I&C data acquisition was used through a link to the I&C – but otherwise without any further connections. This has changed dramatically.

Today, the "boiler fatigue monitoring" module is fully integrated into the SPPA-T3000 process I&C system and is a standard industrial application. However, it can be combined with any other I&C system via OPC data connection.

2. FUNDAMENTAL PRINCIPLES OF BOILER FATIGUE MONITORING

Many highly-loaded components of the water and steam piping systems with limited service life are implemented in power plant boiler construction. In particular, these are the feedwater heater, superheater, attemperators, headers, piping and internal boiler lines.

The theoretical service life of a component is precalculated for a specific design loading. Operating conditions outside of the design conditions can result in premature failure of the component.

_____________________ Ulrich Kunze and Stefan Raab: This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

The actual anticipated time until failure of the component at the current operation time is known as remaining useful life


25


(RUL). The sum of prior operating time and RUL may be greater or less than the theoretical service life due to past operating conditions outside of the design conditions. The residual life is calculated as the difference between theoretical service life and (material) fatigue.

Fatigue results from

• Creep fatigue and/or • Low-cycle fatigue.

2.1. Creep Fatigue

Creep fatigue designates the fatigue of a component as a consequence of creep damage. Creep damage always occurs when the component is operated above the grain recovery temperature characterizing the material. Creep fatigue results at the most heavily loaded area of the component, generally the area of a cutout. Peak stresses occur here which can result in plastic deformation of the material. The allowable service life is dependent on the component temperature such that the service life is limited at a constant load and decreases with increasing temperature.

2.2. Low-Cycle Fatigue

Low-cycle fatigue is the fatigue of a component as a result of cyclic strain loading. Cyclic strain loading occurs when the part is subjected to pressure changes and/or fluctuating fluid temperature distributions. Thermal stresses resulting from locally transient temperature distributions are superimposed on the compressive loads. Each cycle in the resulting stress (load cycle) leads to utilization of the low-cycle fatigue resistance (low-cycle fatigue) and thus finally to stress cracking at the most highly-loaded point.

3. CODES AND REGULATIONS

Since 2001, EN 12952 has applied for the design and monitoring of boilers in Germany and many other (European) countries (for design: Part 3 and for continuous monitoring: Part 4; cf. Fig. 1).

EN 12952 supersedes the Technical Rules for Steam Boilers (TRD) which served for many years as the basis for design and monitoring, but which are also closely related.

EN12952: Water-tube boilers and auxiliary installations

Fig. 1 Fatigue calculation in accordance with EN 12952

4. DESIGN LOADING FOR STEAM BOILERS

During the design of a steam boiler, it is checked whether the selected design including the intended materials will withstand the loading warranted by the manufacturer.

This verification is performed by the boiler manufacturer.

The manufacturer assumes a service loading combination for subsequent operation for this purpose, comprising, for example, the following typical parameters:

• Service life: 25 years or 200,000 h • Cold starts (120-h outage): 50 • Warm starts (weekend outage): 1250

Creep fatigue

EN12952-4 Appendix A

EN12952 Part 4: In-service boiler life expectancy calculations

EN12952 Part 3: Design and calculation for pressure parts

Low-cycle fatigue

EN12952-4 Appendix B

2


26


• Hot starts (overnight outage): 5000 as well as further possible operating cases.

The anticipated fatigue is calculated for the critical areas (components) of the steam boiler based on EN 12952-3, which results accounting for the design service loading combination. This must always be less than 100%. The boiler manufacturer will generally design the boiler so that there is some reserve with regard to the design service loading combination.

However, these design conditions will be deviated from during operation. It is frequently the case that the power plant is initially in base load operation due to its favorable efficiency compared with the other available power plants. With increasing age, it will be deployed more and more in cycling duty or as a peaking plant.

This different operating mode compared with the design of course results in a different anticipated service life of the boiler – for which reason the boiler must be continuously monitored.

According to the service loading combination, which includes service time as well as starts and load changes, the actual fatigue is shown in percent and not in hours. This applies also for RUL.

5. CALCULATION METHOD

5.1. Creep Fatigue

Calculation of creep fatigue DC is based on a comparison of the exposure time Top of a component at specific levels of pressure and temperature with the theoretical service life Tal of the component at these conditions:

∑∑=i k kial

kiopc T

TD

,,

,,

The theoretical service life is calculated from the creep resistance (material property), the operating temperature and the membrane stress (or pressure).

The procedure is as follows: From inside pressure the circumferential stress for the inner surface of the most loaded nozzle bore is calculated. This stress is compared with the temperature dependent stress-rupture strength given for 10,000, 100,000 and 200,000 h and taking into account a 20% safety margin. The result is the theoretical service life for the inside pressure and temperature value. (see EN 12952-4 for details of calculation ).

Fig. 2 Determination of creep fatigue from exposure time for example of class 560...570°C / 115...120 bar (theoretical service life in class: 4,060,000 h)

3


27


For a later quick overview of the operating mode of the power plant, it is expedient to categorize pressure and temperature in classes – to perform the fatigue calculation in classes. The classification is given by experts. The background is to define small intervals for normal operation values and wider intervals for low temperature and pressure values.

It can then be easily seen during the analysis how long the component has been operated within specific temperature/pressure ranges (see Fig. 2).

5.2. Low-Cycle Fatigue

Low-cycle fatigue DF is determined by counting the number of load cycles n and comparing these with the number of cycles to crack initiation N of the component for specific values of the stress range 2f and temperature t on which the load cycle is based:

∑∑=i k ki

kiF N

nD

,

,

A load cycle is defined by EN 12952-4 as a closed hysteresis loop in the stress/strain diagram. The stress in the material is calculated from the pressure and temperature gradient, while the numbers of cycles to crack initiation are material properties.

To simplify future analysis, it is expedient to categorize the stress range and temperature in classes. This makes it easy to assign load cycles to specific operating modes.

Stresses (including extremes) which cannot yet be combined in load cycles are maintained on the "list of residual extremes" until a "partner" is found for them.

5.3. Total Fatigue

The total fatigue of a component is determined as the sum of:

• Creep fatigue, • Low-cycle fatigue, • Fatigue from the current list of remaining extremes, • Fatigue from prior history of the component and • Correction of fatigue. Fatigue from the current list of remaining extremes is an estimate of the fatigue component of the stress values which do not yet represent load cycles.

6. CONTINUOUS FATIGUE MONITORING

Continuous fatigue monitoring yields information on the actual service life utilization based on the actual (measured) design of the boiler components and the current operating

mode. In practical terms, this constitutes verification of the design analysis.

Continuous fatigue monitoring is the responsibility of the power plant operator (not the manufacturer) and shall be performed for the most highly loaded components.

The manufacturer, the subsequent power plant operators and the licensing authority define jointly, which components of the boiler should be continuously monitored, usually in the construction phase of the power plant.

6.1. Monitored Components

Heavily loaded components which are continuously monitored with regard to creep fatigue and low-cycle fatigue are as follows:

• Headers • Drums • Separators • Spray attemperators • Piping (pipe bends) Drums and separators are generally only monitored for low-cycle fatigue (not creep fatigue), as these components are operated in temperature ranges for which no creep of the material occurs (below the grain recovery temperature).

6.2. Requisite Measuring Points

Operating parameters (measured values) are required for each component to be monitored for calculation of the service life:

• Creep fatigue: - Mean wall temperature tmw and - Internal pressure p

• Low-cycle fatigue: - Inner wall temperature tim - Mean wall temperature tmw - Internal pressure p

As a general rule, the drums and headers to be monitored are already equipped with temperature measurements in the component wall by the manufacturer.

If measurement of the inner wall temperature and the mean wall temperature is not possible, these temperatures can be calculated from the time behavior of the medium temperature.

6.3. Preparatory Calculations

Before the start of the on-line calculation, all of the parameters to be determined once are specified or determined. These are as follows:

4


28


5

The creep, low-cycle or resulting total fatigue are always recalculated for each data acquisition interval, so that the residual service life of a component is always up to date.

• Specification of classification for creep and low-cycle fatigue calculation

• Calculation of theoretical service life for each pressure/temperature class

7. INTERNAL STRUCTURE OF CONTINUOUS FATIGUE MONITORING • Calculation of numbers of cycles to crack initiation for

each stress range/temperature class The boiler fatigue monitoring module FMS (fatigue monitoring system) was developed for calculation of the creep fatigue and low-cycle fatigue.

6.4. On-Line Calculations

Continuous fatigue calculation is performed online. The following steps are processed sequentially: FMS requires a standard PC.

The FMS software is implemented as a Web application. This system concept enables operation and calling up of information both directly in the system as well as from any PC in the office network (providing that a connection to the office network is implemented).

• Acquisition of the requisite values (a typical acquisition interval is 30 s)

• Calculation of inner wall temperature and mean wall temperature from the fluid temperature if these are not directly measurable

All data (measurement, configuration and results data) are stored in a database.

• Determination of exposure times and calculation of current creep fatigue

Background processes activated by time control ensure that the data acquisition and on-line fatigue calculation are performed continuously and independently of the user interface.

• Calculation of component stress, check if new load cycles have taken place, assignment of the load cycles to the defined classes and determination of current low-cycle fatigue and fatigue component of the list of residual extremes. Display of the information in the form of logs, tables or

trend plots is updated and compiled based on the latest database values each time the user interface is called up. (cf. Fig. 3).

• Calculation of total fatigue

Fig. 3 Structure of FMS


29


8. INTEGRATION OF FMS IN THE SPPA-T3000 PROCESS I&C SYSTEM

Although FMS obtains data from the process I&C as an independent module, the results obtained are not written back again.

The configuration of the module is more complex – for FMS this includes the entry of material and component data. Previously, this had to be performed completely separately from configuration of the process I&C system with separate tools. Measuring points and their designations in the process I&C had to be coordinated through parameter lists and entered in the FMS.

Many diagnostics modules which do not yet provide integrated functions in the I&C are in a similar situation.

It is therefore often desirable – as was also the case for the FMS – to be able to perform the configuration with the tools of the process I&C and to display the results from the diagnostic module in the I&C and to be able to use the infrastructure available there (display in process displays, trend plots, automatic report generation).

The solution lies in embedding the modules in a runtime container (see Fig. 4).

A runtime container is a component of the SPPA-T3000 process I&C system with strictly defined interfaces and functions. In addition to simple measurement and calculation results, further, more complex structured data can be exchanged with other SPPA-T3000 components

through these interfaces. In addition, SPPA-T3000 can control and diagnose the module through these interfaces.

The program code of FMS is not changed by embedding. It is the original code of the independent module, which ensures that errors within the module can be ruled out on integration and that existing certifications remain effective.

Embedding enables reuse of the results from the module in the process I&C. For example, they can be

• Displayed together with measured values in process and curve displays

• Stored in the process data archive and used together with stored measured values for later evaluations

• Input in controls or other automatic control functions • Used for generating alarms which are annunciated

together with process alarms in the alarm display. However, some results (the classified fatigue data) must be expected in very specific displays in the boiler fatigue monitoring module which exceed the possibilities of the standard tools in SPPA-T3000.

It proved to be an advantage here that both the SPPA-T3000 process I&C system as well as the FMS module use Web technology for the graphical user interface. It was thus possible to insert special logs and displays from FMS as a separate (browser) window in the user interface of the process I&C. The authorizations and thus also the access restrictions of the T3000 operator be inherited here from SPPA-T3000 to FMS.

Fig. 4 Embedding the FMS module (code and data) in an SPPA-T3000 runtime container

6


30


9. EXAMPLES

The relevant information for continuous fatigue monitoring is provided in the form of logs on the user interface (and also in parallel as a PDF file for downloading):

• Overview (summarizing tabular presentation of fatigue values for all monitored components, cf. Fig. 5)

• Theoretical service life (component-specific) for each defined pressure/temperature class

• Exposure time log (component-specific) – operating time for each defined pressure/temperature class including the resulting creep fatigue

• Numbers of cycles to crack initiation (component-specific) for each defined stress range/temperature class

• Load cycles (component-specific) for each defined stress range/temperature class including the resulting low-cycle fatigue (cf. Fig. 6)

• Configuration data for the components to be monitored • Configuration data for materials (material database) In addition to the output of logs, the results can also be displayed graphically – fatigue values together with operating parameters – this enabling direct comparison of the operating mode of the plant with the resulting utilization values (cf. Fig. 7).

Fig. 5 FMS overview display (all monitored components and the current results of the fatigue calculation)

Fig. 6 FMS display output (detail protocol for HP drum 10HAD10BB001W – matrix of completed cycles in dependence of temperature and stress)

7


31


Fig. 7 Trend of active power (blue) and low cycle fatigue (red) over time from Oct. 10 to Nov. 03 with a noticeable start-up at Oct. 15

Low cycle fatigue (DWE) depicts the influence of load cycles – in particular start-up and shut-down. Fig. 6 shows the detail protocol of low cycle fatigue with cycles in dependence of temperature and stress. Start-up and shut-down operations that contribute particularly to the lifetime consumption can easily detected by a simple representation of the trend. An example for this purpose is Fig. 7 with active power of the power plant and low cycle fatigue in dependence of time.

In the period under review several start-up and shut-down operations happened. While for most operations the low cycle only slightly increased, it rose significantly at the 15th October start-up.

The analysis of the start-up showed that cold fluid was injected into the boiler, which resulted in high stresses in the boiler component wall. It was recommended to prevent such operation in the future.

10. SUMMARY

Assessment of fatigue and remaining useful life for boilers according to the 2001 issued EN 12952 uses simplified methods for the evaluation of creep and low cycle fatigue. The consequence of this simplification is some conservatism for the estimated damage fraction.

The remaining useful life is expressed in parts of 100% taking into account the different approaches of creep fatigue

and low cycle fatigue, especially that cycling operation of power plants results in low cycle fatigue and is not related to hours.

The main advantage of the assessment procedures from the standard is that they are easy to apply. Particularly they are qualified for temperatures up to more than 600 °C and 200,000 h and more service time of boiler components.

The boiler fatigue monitoring module FMS is based on EN 12952. FMS has been in use in power plants since mid-2004. It has been included in the scope of supply for many new combined-cycle power plants from Siemens Energy or backfit in existing plants.

The FMS module is certified by the German technical inspectorate TÜV Süd.

For the power plant operator, the implementation of FMS provides a continuous overview of the service life utilization of his boiler, so that

• The time for a necessary inspection can be selected optimally and thus the operating time between two inspections maximized

• Power plant safety can be increased • Operating modes causing heavy wear can be detected

and if possible prevented

8


32


• Components can be operated close to the material limits, so that the operating time of the plant can be maximized and operating costs minimized

The assessment of fatigue and remaining useful life for boilers according to EN 12952 is accepted as a standardized industrial application.

The Fatigue Monitoring System (FMS) is a standard industrial application for both, new built power plants and retrofits of existing units of any kind. The system is not limited to Siemens I&C systems, it is possible to integrate FMS also into power plants with I&C systems of other suppliers. FMS is also capable for calculating the remaining lifetime for boilers designed according to the American standard ASME VIII-2. Since 2004 it has been successfully implemented for more than 50 boilers.

REFERENCES

European Committee for Standardization (CEN) (2001). EN 12952-3, Water-tube boilers and auxiliary installations - Part 3: Design and calculation for pressure parts, Brussels, Belgium

European Committee for Standardization (CEN) (2001). EN 12952-4, Water-tube boilers and auxiliary installations - Part 4: In-service boiler life expectancy calculations, Brussels, Belgium

The American Society of Mechanical Engineers (ASME) (2001), ASME Boiler and Pressure Vessel Code Section VIII Division 2 (ASME VIII-2), Rules of Construction of Pressure Vessels – Alternative Rules, New York

American Boiler Manufacturers Association (ABMA), Task Group On Cyclic Service (2003), Comparison of fatigue assessment techniques for heat recovery steam generators, Version 1-1

Kunze, U., Walz, H. (2007), Integration of Web based Diagnostic Systems into Power Plant I&C with Boiler Fatigue Monitoring as an Example (in German), Proceedings of International ETG Congress, October 23/24, Karlsruhe, Germany

Kunze, U., Pels Leusden, C., Spinner, R., Hackstein, H., Walz, H., (2008), Integration der Lebensdauerüber-wachung von Dampferzeugern in die Kraftwerksleit-technik, 40. Kraftwerkstechnisches Kolloquium 2008, October 14/15, Dresden, Germany

BIOGRAPHIES

Ulrich Kunze is physicist and received a doctorate (Dr.-Ing. habil.) in mechanical engineering. He works as a senior expert for diagnostics of fossil fired power plants at Siemens AG in Erlangen (Germany). Previous he was head of the diagnostics department in a German nuclear power plant and then project manager at Siemens responsible for installation of diagnostic systems in nuclear power plants world wide. Currently he is member of the DIN and ISO specialist groups for Condition Monitoring and Diagnostics of Machines (TC 108 SC 5).

Stefan Raab received a doctorate (Dr.-Ing.) in mechanical engineering. Currently he is leading the diagnostics group for plant performance at Siemens AG in Erlangen (Germany). He is member of a VGB PowerTech specialists group for power plant performance diagnostics.

9


33

Autonomous Prognostics and Health Management (APHM)

Jacek Stecki1, Joshua Cross

2, Chris Stecki

3 and Andrew Lucas

4

1,3PHM Technology Pty Ltd, 9/16 Queens Pde, VIC 3068, Australia.

[email protected]

[email protected]

2,4Agent Oriented Software Pty. Ltd., 580 Elizabeth Street, Melbourne, VIC 3000, Australia.

[email protected]

[email protected]

ABSTRACT

The objective of this paper is to show how PHM con-

cepts can be included in the design of an autonomous Un-

manned Air Vehicle (UAV) and in doing so, provide effec-

tive diagnostic/prognostic capabilities during system opera-

tion. The authors propose a PHM Cycle that is divided into

two parts, covering the design of the Autonomous PHM

system and the operation of the PHM system in real-time

application. The paper presents steps in design of Autono-

mous Prognostics and Health Management (APHM) devel-

oped using the above approach, to provide contingency

management integrated with autonomous decision-making

for power management on a UAV. APHM was developed

using commercial software tools such as the JACK® auton-

omous software platform to provide real-time intelligent

decision making and MADe®

- Maintenance Aware Design

environment to identify risks due to equipment failures and

to select appropriate sensor coverage. The PHM Cycle

methodology is demonstrated in application to autonomous,

real-time, engine health and power management on an Un-

manned Air Vehicle (UAV).

1. INTRODUCTION

Prognostics and Health Management (PHM) is a new ap-

proach to enhancing system sustainability which redefines

and extends Condition based Maintenance (CBM) on the

basis of current advances in failure analysis, sensor technol-

ogy and AI based Prognostics (Scheuren, W. J., Caldwell,

K. A., Goodman, G. A. & Wegman, A. K. (1998)). The two

basic tenets of Prognostics and Health Management are:

Prognostics - predictive diagnostics which includes de-

termining the remaining life or time span for the proper

operation of a component

Health Management - the capability to make appropriate

decisions about maintenance actions based on diagnos-

tics/prognostics information, available resources and op-

erational demand.

The paper discusses the methodology for integrating PHM

concepts into system design to provide autonomous diag-

nostic and prognostic capabilities during system operation

The Autonomous PHM system proposed in this paper is

designed on the basis of correct risk assessment and the

reasoning capability which is able assess the sensor readings

and determine the state of the system and the appropriate

action. The effectiveness of a PHM system depends on

comprehensive and correct identification of risks due to

system failures and system responses to those failures.

Knowing the failures. the optimum combination of sensors

must be identified and any ambiguities in the detection of

failure modes resolved. Sensor coverage can be augmented

by BITs and component-specific sensors to increase reliabil-

ity of diagnostics and to eliminate ambiguities in the detec-

tion of failure modes. The resulting sensor set provides

sensing patterns which are syndromes of particular failures

of the system, and can be expressed as diagnostic rules. Di-

agnostic coverage maybe further enhanced by application of

probabilistic methods. Having identified the functional fail-

ure modes and determined their criticality, reasoning tech-

niques based on artificial agent technology can be applied to

determine a set of actions that is the most appropriate for the

given situation.

A reasoning system improves the diagnosis by maximiz-

ing the likelihood of determining the failure mode correctly.

It is also able to determine the most appropriate course of

corrective actions – taking into account current circum-

stances such as the flight mode, power requirement and the

state of both engines.

_____________________

Jacek Stecki et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which

permits unrestricted use, distribution, and reproduction in any medium,

provided the original author and source are credited.


34


2

This provides a greater level of awareness than a warning

light. Normally, a human would have to determine the ap-

propriate action on their own, based on the information

available (warning lights, error codes, vibrations, etc.).

However, if the human (or a decision-making system) re-

ceives incorrect or incomplete information they may take an

unnecessarily cautious approach and, for example, shut the

engine down, or they may continue the current operation

failing to take any remedial action. Both of these circum-

stances can lead to catastrophic consequences

Often an overly sensitive failure detection system can cause

“false positive” warnings, i.e., generating an alert for a non-

existent fault. This problem is highlighted in a recent Flight

International magazine article on the introduction of a new-

generation airliner with sophisticated fault detection and

alert system (Anon (2010)). One airline experienced a

plethora of system nuisance warnings, which: “are driving

down technical dispatch (reliability)”. Another operator

reported: “What we are grappling with are algorithms for

failure detection, which not only detect a failure but also act

upon it. Unfortunately this can lead to a perfectly healthy

system being shut down or [a no-go fault warning] for a

problem that was minor enough to have been deferred.”

The Autonomous PHM system discussed in this paper aims

to apply reasoning equivalent to that of a human crew and

thus act like an artificial assistant. Such a system could

greatly reduce the crew or operator workload in high stress

situations, leading to improved levels of safety. This paper

uses results of work on the development of PHM and con-

tingency management integrated with autonomous decision-

making carried out as a core part of the UK National

ASTRAEA Unmanned Air System (UAS) program. This

program is paving the way for commercial UAS to operate

autonomously in non-segregated airspace within the next

decade.

It proposes the integration of PHM into a system at the de-

sign stage, based on a PHM Cycle that combines the Design

and Operational perspectives. Combining capabilities of

current commercial software tools, such as JACK - the au-

tonomous software platform and MADe - the Maintenance

Aware Design environment a PHM system is designed of-

fering greater accuracy in the detection of faults, and

providing selection of the best response actions ((Glover,

W., Cross, J., Lucas, A., Stecki, C., & Stecki, J. (2010)).

2. THE PHM CYCLE

The proposed PHM Cycle is divided into two parts, cover-

ing the design and operation of the system as shown in Fig.

1.

Figure 1. PHM Cycle

The Design Cycle applies multiple iterations of risk analysis

techniques, failure mode prediction, and identification of

responding actions to achieve an appropriate level of func-

tional failure coverage. The outcome of this is a knowledge

base which can then be applied to a system in operation.

The Operational Cycle describes the PHM process when the

system is put into operation. It describes how information

about faults is gathered, assessed and presented to the end

user, or addressed by the autonomous system.

By structuring the PHM design process appropriately, data

from the Operational Cycle can be fed back and incorpo-

rated into the Design Cycle, yielding continuous improve-

ment in future upgrades or revisions.

2.1 The PHM Design Cycle

The objective of PHM Design Cycle is develop an advisory

system which will assess, in real-time, the health of the

system and recommends corrective actions to a higher-level

decision maker that has to deal with a number of potentially

conflicting goals, hostile situations and opportunities apart

from input from the PHM. The decision-maker, either a

human or a fully autonomous decision system, will have the

situational awareness to apply the recommended actions

appropriately.

The Design Cycle begins with the specification of the sys-

tem to be built, which is modeled as a functional block dia-

gram.

Risk Analysis and Determination of Functional Failure

Modes. The first requirement of the risk analysis is to iden-

tify the possible Functional Failure Modes (FFMs) for the

system and to understand their failure dependencies

throughout the system. FFMs are the result of specific un-

derlying physical failures triggered by design, manufactur-

ing, environmental, operational and maintenance causes.

Such causes (e.g. vibration) can initiate failure mechanisms

(e.g. high cycle fatigue) that lead to a fault (e.g. fracture).


35


3

The second requirement is to determine how the failures

propagate through the system (known as the propagation

path) and how this impacts the system functionality. The

availability of such information is a key requirement for

designing, developing, verifying and validating PHM sys-

tem design.

Causes of Failures

Identification of faults and failure modes

Criticality of each failure

Interaction between failures (dependencies)

Expected functional/hardware reliability

Diagnostic coverage

Predictive failure model

MADe

Set of Beliefs (data set)

Set of events (e.g failure) it will respond to

Set of goals to achieve

Set of plans to handle goals and events

JACK

Sensor coverage

Figure 2. Design of APHM

The outputs of the risk analysis process are usually captured

in a Failure Modes and Effects Analysis (FMEA). Once the

FMEA is available, the criticality of each FFM is estab-

lished taking into consideration each specific failure and its

propagation paths, the output of this process is the Failure

Modes Effects and Criticality Analysis (FMECA) report.

Further assessment of the risk is obtained by carrying out

reliability analysis using Reliability Block Diagrams and

Fault Trees. Extensive evaluation of system sustainability is

conducted using a Reliability Centered Maintenance (RCM)

methodology. Reliability analysis is usually performed on

the basis of the expected Mean Time Between Failure

(MTBF) of hardware components as provided by manufac-

turers or on the basis of published MTBF standards. In addi-

tion to this information PHM requires an assessment of the

reliability of specific functional outputs in the system –

‘functional reliability’.

At the conclusion of the risk assessment process, the user

can expect to know:

how the system elements can fail (failure modes)

the criticality of each failure

the likely causes of functional failures

the interactions between functional failures

what physical failures are linked to functional failures

the expected functional and hardware reliability of the

system.

The information obtained during development of FMECA

and Reliability studies is a basis for selecting sensor sets

able to detect identified failures and formulating diagnostic

rules. This process is discussed in the following section.

There are two type of approaches to the failure risk analysis.

The first is a “committee approach” where a team subject

matter experts determine failures and their dependencies and

subsequent list them in the “spreadsheet” type software.

Quality of the analysis depends on knowledge and experi-

ence of team members. Reliability studies are usually car-

ried out by a different team of people using specialized reli-

ability software. Sensor selection and development of diag-

nostic rules cannot directly use the results of FMECA anal-

ysis.

The second, model-based risk assessment approach uses

existing failure databases and expert knowledge captured in

the form of Failure Diagrams and Functional Block dia-

grams of a system. A standardized functional and failure

taxonomy ensures consistency in the interpretation of failure

analysis results (Rudov-Clark, S. J., & Stecki, J. (2009)).

Reliability models are automatically generated from the

functional model of the system. Sensor selection and diag-

nostic rules are also determined based on automated analy-

sis of the functional model.

Risk assessment as briefly described above forms the basis

for any further work on the development of PHM system.

Some common problems causing sub-optimal operation of

PHM systems can be traced to following risk assessment

deficiencies:

dependencies of failures are not

identified

inadequate identification of risks

incomplete database of failures

i

nconsistent language used to define functions and fail-

ure concepts

confusing hardware reliability with functional reliabil-

ity

different models for Criticality and Reliability Assess-

ments.To overcome these deficiencies MADe - the Mainte-

nance Aware Design environment was used as a risk as-

sessment tool facilitating failure modes analysis and relia-

bility assessment.

Sensor and Diagnostic coverage. Detection of a failure

mode is the first and most important step in the PHM pro-

cess. After all, if we cannot identify failure mode we cannot

propose a corrective action. When a failure mode is isolated

the reasoning system will attempt to identify the causes of

the failure mode.


36


4

Sensors are usually selected to detect specific identified

failures (e.g. temperature sensor detect temperature change

indicating a failure of the heater) thus they are selected to

detect symptoms of failures. The sensors are usually select-

ed by personnel responsible for individual components or

subsystems who may have only limited knowledge of the

impact of their failures on system failures. The final compo-

sition of sensor set is decided upon by the system integrator

using criteria such as cost, weight, reliability and computing

requirements. The overall coverage of system failures is

determined using testability analysis software. The

diagnostic rules are developed on the basis of symptoms.

This methodology has following weaknesses:

sensor fusion is not based on failure dependencies

(fallback – testability)

diagnostic rules are not based on failure dependencies

failure coverage is often incomplete and cannot be as-

sessed

sensor selection does not consider the criticality of fail-

ures, or the functional and hardware reliability

sensor fusion is difficult to implement without failure

dependency information.

Model-based approach to sensor selection disposes of some

of these weaknesses. The MADe/PHM module uses the

model of the system and failure dependencies data obtained

in the risk analysis phase and provides the user with an au-

tomated ‘sensor set design’ function (Rudov-Clark, S. D. ,

Ryan, A. J. ,Stecki, C. M. & Stecki, J. S. (2009)). Each

potential sensor set provides a logical cover of the identified

failures. In contrast to the above mentioned ‘symptom of

failure’ methodology, the sensor set fuses sensors reading to

provide a syndrome of failure. The selection of compo-

nent/subsystem sensors solely on the basis of failure symp-

toms can also be carried out and fused with sensor sets

based on identification of the syndrome of failure.

By applying this automated approach, with associated ca-

pability to conduct trade-off studies of sensor properties

such as cost, weight, coverage and reliability, the engineer

can select the best possible arrangement of sensors for the

given constraints, providing the highest practical level of

fault coverage achievable.

Although full coverage of faults is always preferable, it is

not necessarily achievable due to system constraints. Also,

some failure modes may have degrees of criticality that are

below the level of concern and thus they can be excluded

from further analysis.

If full failure coverage is not achieved by the set of diagnos-

tic sensors then ambiguity groups exist, i.e. a number of

different failure modes have the same system functional

responses. These ambiguity groups can be resolved by iden-

tifying the most likely fault based on the probability of fail-

ure and information about the physical processes and symp-

toms for each failure provided in the failures database.

The system designer must be aware of the potential implica-

tions of any unresolved ambiguities. These ambiguities will

directly impact upon the ability of the PHM function to take

the best remedial action – if it is unable to identify the cor-

rect failure mode then it is unlikely to respond correctly. As

such, the designer should, possibly during subsequent de-

sign iterations, attempt to remove these ambiguities wherev-

er possible or have contingencies built into the responses to

handle their occurrence, for example by integrating BITs or

other sensors associated with components.

It is important to remember that the above sensor require-

ments analyzes are based on a functional model that is

qualitative in nature. Thus further quantitative analysis of

the sensor set should be considered to validate the results.

The selected sensor set and results of the failure modes and

effects analysis provide the basis for the design diagnostic

rules needed to identify each failure mode.

Detection and Diagnosis. In on-line, real-time operation

inaccurate sensor readings may introduce response patterns

which do not correspond to any of the diagnostic rules. One

potential solution is the use of multiple redundant sensors

that provide a means for resolving differences (e.g. by “vot-

ing”). Another solution is the application of reasoning tech-

niques that look for the probable cause of any undefined

sensor readings.

Theoretically a sensor set which provides required diagnos-

tic coverage of failure modes will identify all the failure

modes. In practice it is not always so. In practical terms a

Figure 3. Operational cycle


37


5

diagnostic sensor set has a certain Probability of Detection

(POD) which is a function of reliability of detection, pro-

cessing, and interpretation of information provided by indi-

vidual sensors in a diagnostic set. Each failure mode may

have different POD. Thus to diagnose a failure mode the

reasoning system must not only identify the appropriate

sensor responses but also consider Probability of Detection

of this sensor and that of a whole sensor set. For example, a

pressure sensor may have much higher POD than a vibra-

tion sensor, mainly due to the low reliability of vibration

signal interpretation.

As the PHM system should provide predictive capability,

the failure models should be extended to include infor-

mation such as historic data of previous failures, results of

tests, physics of failure, and length of time for a failure to

develop. The length of time it takes for a failure mode to

develop, from the initiation of a failure mechanism to the

development of a fault and propagation of the subsequent

functional failure, is important information for choosing the

best actions to mitigate the failure. If a failure is instantane-

ous, for example, fan blade failure due to catastrophic For-

eign Object Damage (FOD), then immediate action will be

required. If a failure is gradual there could be some time to

perform other actions to slow down the progression of the

fault or mitigate its consequences.

For example, compressor blade damage from a bird strike

that leads to high-cycle fatigue failure can be addressed by

reducing the engine speed thus reducing the rate of crack

propagation.

Different faults and failure modes may occur in rapid suc-

cession, leading to multiple simultaneous responses being

detected.

Developing the Knowledge Base. The knowledge base

developed during the PHM Design Cycle includes:

1. a rule base for performing diagnostics and identifying

each FFM along with its underlying causes

2. a predicted failure model

3. a set of actions corresponding to each failure.

The knowledge base is designed in such a way that a deci-

sion-making system such as an artificial agent can reason

about it. If possible, the actions should provide complete

coverage of all identifiable failures, and give all possible

responses (or actions to be taken) for the identified failure.

With the possible FFM identified, and the sensors chosen

and the rules for identifying these failures deduced, the ac-

tion required for each failure are determined. On-line PHM

systems reason about actions in often rapidly changing envi-

ronments, and operate autonomously. Architectures such as

the Beliefs, Desires, Intentions (BDI) model have been de-

veloped to deal with these kinds of situations and are im-

plemented in the JACK autonomous software platform.

A JACK agent is a software component that can exhibit

reasoning behaviour under both pro-active (goal directed)

and reactive (event driven) stimuli. Each agent has:

a set of beliefs about the world (its data set)

a set of events that it will respond to

a set of goals that it may desire to achieve (either at the

request of an external agent, as a consequence of an

event, or when one or more of its beliefs change), and

a set of plans that describe how it can handle the goals

or events that may arise.

In particular, each agent is able to exhibit the following

properties associated with rational behaviour:

Goal-directed focus – the agent focuses on the objective

and not the method chosen to achieve it

Real-time context sensitivity – the agent will keep track

of which options are applicable at each given moment,

and make decisions about what to try and retry based on

present conditions

Real-time validation of approach – the agent will ensure

that a chosen course of action is pursued only for as

long as certain maintenance conditions continue to be

true

Concurrency – the agent system is multi-threaded. If

new goals and events arise, the agent will be able to

prioritise between them, resolve potential conflicts (e.g.

by deliberate to reject or ignore certain goals or delay-

ing their resolution to a later time), and multi-task as

required.

When an agent is instantiated in a system, it will wait until it

is given a goal to achieve or experiences an event that it

must respond to. When such a goal or event arises, it deter-

mines what course of action it will take. If the agent already

believes that the goal or event has been handled (as may

happen when it is asked to do something that it believes has

already been achieved), it does nothing. Otherwise, it looks

through its plans to find those that are relevant to the request

and applicable to the situation. If it has any problems exe-

cuting this plan, it looks for others that might apply and

keeps cycling through its alternatives until it succeeds or all

alternatives are exhausted. The BDI agent-is able to be pro-

grammed to execute these plans just as a rational person

would.

2.2. The PHM Operational Cycle

Once the Design Cycle has been completed and the Auton-

omous PHM system contains a sufficient level of coverage

the system, along with the knowledge base developed, it can

be put into use on board of the host system.


38


6

The Operational cycle consists of following activities, Fig.

3:

Real-time Monitoring. In operation, the PHM function will

receive signals from each of the sensors located in the sys-

tem or its sub-components. These signals will be constantly

monitored, as in conventional systems, so that signal levels

that are outside the normal range are detected as anomalies.

This differs from conventional approaches as instead of giv-

ing a simple warning the anomalies are passed to an on-

board diagnostic unit that can provide a response appropri-

ate in the current circumstances, and also show how to re-

duce or mitigate the identified fault’s effects.

On-board Diagnostics. The on board diagnostic unit will

make use of the knowledge base developed in the Design

Cycle to associate the anomaly or anomalies with a particu-

lar FFM. The knowledge base can also provide enough in-

formation to identify or predict which physical parts or fail-

ure mechanisms are responsible for the failure. If the sensor

readings are not sufficient, the diagnostic unit should once

again examine reliability data, criticality, and dependencies

to determine the FFM. Context specific confirmation rules,

can also be applied to help resolve ambiguities or probe

further.

Failure Prediction. Once the particular FFM has been iden-

tified, the PHM system must predict the remaining life asso-

ciated with that failure. The failure models (contained in the

knowledge base) for the sub-components or parts identified

to have failed, will be analyzed in order to determine what

time constraints are involved and how the failure will de-

velop.

Action Determination. The PHM system now has all the

information it needs to make an informed decision about

which actions it should take (in the case of an autonomous

system), or recommend. It now has at its disposal:

the sensor readings perceived to be anomalous

the functional fault this corresponds to

the physical defect or failure likely to have caused this

fault

a model of how the system will continue to fail, includ-

ing the estimated time before further failures occur. Us-

ing the above information the PHM system will select

the actions that it perceives to be the best for the given

situation.

Depending on application, PHM capability can be designed

into autonomous or semi-autonomous systems to diagnose

faults, predict remaining functional life and suggest reason-

able actions to deal with these events, if (or when) they oc-

cur. When deployed, depending on application, the action

determined by Autonomous PHM would not necessarily be

the final action to be performed. This is due to the Autono-

mous PHM not necessarily having complete knowledge of

the situational context surrounding the system’s operation.

In such application it would pass the appropriate action al-

ternatives to a higher-level decision-making system or hu-

man user who, in turn, would make this selection and initi-

ate the associated action.

3. EXAMPLE: POWER MANAGEMENT ON A UAV

A typical example is an autonomous, real-time, engine

health and power management on an Unmanned Air Vehicle

(UAV) where it might manage the specific subsystems (i.e.,

the engine, drive trains, etc.) of the overall vehicle.

The PHM and Power Management, Fig. 4, forms part of

a delegated autonomy architecture in an autonomous system

with the human overseer always remaining in the position of

ultimate management responsibility. It will not know how

critical these requirements are with respect to the overall

task being performed by the vehicle it is attached to. It is the

responsibility of the high-level decision maker to evaluate

the mission or task, as it is in the best position to make such

a decision. It can then feed new requirements to the Auton-

omous PHM.

Consider a UAV in flight: the autonomous software must be

able to handle faults when they occur with equivalent or

better levels of competence than a human pilot if the UAV

is to achieve civil certification. The faults identified may

require actions to be taken to avert danger and could cause

the mission to be altered or abandoned.

Design. The example being used is the lubrication system

on the Rolls-Royce 250 engine, and how failures can occur,

e.g. of bearings. The FMECA analysis was completed in

MADe. The autonomous PHM capability is being imple-

mented in AOS’s C-BDI, and the operational scenario is

based upon a twin-engine UAS operating at high power in a

hot and high altitude environment. It is expected that this

demonstrator will be completed in 2012 and the results pub-

lished at that time.

Figure 4. Delegated Autonomy Architecture


39


7

1. The development of PHM system followed the above

Design Cycle methodology:a functional model was cre-

ated of the engines, including the interactions between

the critical internal components (over 12000 functional

connections).a risk analysis was performed determining

the various ways the engine can fail. The sensor types

and locations are chosen and rules identified that associ-

ate the various sensor readings to FFMs. Data would be

included from previous applications of that engine type

or similar engines, such as maintenance logs, failure

rates, and results of examinations performed on previ-

ously failed engines.

4. the reliability data, when available, will be used to aid in

the creation of the failure models

5. the agent actions are under constructions taking into ac-

count all of the possible actions that can be done to the

engine. These may include increasing or decreasing the

thrust or shutting down the engine completely

6. the knowledge base is being created to be inserted into

the PHM function on the UAV.

Operation. Consider the scenario of the UAV performing a

search and rescue mission. During the operation a bearing

within an oil pump on one of the two engines begins to suf-

fer from wear.

The PHM system would monitor the engine sensors, detect

any anomalies, and determine if these are significant (e.g.,

not just a spike due to a power on/off transition). The FFM

would be detected by sensors as a loss of oil pressure within

that engine which, when compared to the diagnostics rules

contained within the knowledge base would indicate a pump

failure. By examining the failure probability of each com-

ponent within the pump, the level of functionality lost, and

the rate at which functionality is decreasing, the power

management system would recognize that the cause is likely

to be bearing failure.

Analysis of the failure model for bearing wear failure will

give the probable lead-on effects of this failure mechanism.

The system would then examine the possible actions to

overcome this failure, which may include:

shutting the engine down immediately;

reducing thrust to 60% before continuing operation for up

to 2 hours;

reducing thrust to 30% for 4 hours; and,

other combinations.

The PHM system capability would then assess these actions,

based upon the following situational information:the current

power requirement is that both engines need to operate at

30% thrust for 2 hours;.due to a fault that occurred earlier,

the second engine has already been shut down; andthe re-

maining engine is currently running at 80% thrust to com-

pensate.

For the given situation the PHM would recommend the fol-

lowing actions:turn on the second engine, and operate both

engines at 30%, possibly damaging the second engine fur-

ther;leave second engine shut down and reduce thrust as

much as possible, however it must be at least 60% to meet

the power requirements;abort or alter the mission since the

power requirements cannot be met; orreduce thrust to 70%

and see if the oil pressure returns to nominal level. If it does,

continue with the engine power at that level, otherwise re-

duce further.

An example of a JACK graphical plan that implements this

is shown in Fig. 5. This shows how after reducing thrust the

oil pressure will be monitored for some time to see if the

problem is mitigated (the wait_for block). If it is not then

the thrust is reduced further. If the problem gets worse, then

the engine is shut down. If the problem is mitigated, the

maintain block will keep monitoring the problem to make

sure it doesn’t get worse in the future.

Upon receiving these possible actions, the higher-level deci-

sion-making software can determine if the mission is im-

portant enough to continue (at the risk of further failure) or

if it can be altered. Instead of being overloaded with multi-

ple options, or receiving insufficient information from mul-

tiple simple warnings, the autonomous system will receive a

set of possible actions that are succinct and meaningful.

From this set it can choose the best action for the given situ-

ation.

4. DISCUSSION AND CONCLUSIONS

This functional failure mode approach, based on using rea-

soning to improve the diagnosis will maximize the likeli-

hood of determining the failure mode correctly, and deter-

Figure 5. JACK Plan to Handle an Engine Fault.


40


8

mine the most appropriate course of action – taking into

account current circumstances (e.g., flight mode, power

requirement and the state of both engines). Autonomous

systems must have this capability to operate successfully.

Manned systems will also benefit by improving the accura-

cy of failure mode identification and recommending the best

action to take. By acting like an artificial assistant, such a

system could greatly reduce the crew or operator workload

in high stress situations, leading to improved levels of safe-

ty.

By structuring the PHM design process appropriately, data

from the Operational Cycle can be fed back and incorpo-

rated into the Design Cycle, yielding continuous improve-

ment in future upgrades or revisions of the UAV.

The novelty of the system presented here derives from the

combination of a risk assessment tool with the high-level

representation and flexibility offered by a decision-support

tool, making the resulting system appropriate for integration

into a complex architecture for autonomous vehicles where

multiple levels of delegation and decisions (possibly includ-

ing the human) interact to determine and adapt the course of

actions during a mission.

ACKNOWLEDGEMENT

The authors would like to thank the Technology Strategy

Board (TSB) for funding the ASTRAEA and ASTRAEA II

programs which make work like this possible.

REFERENCES

Anon (2010). A380 In-service report.

http://www.flightglobal.com/page/A380-In-Service-

Report/Airbus-A380-In-Service-Technical-issues.

Glover, W., Cross, J., Lucas, A., Stecki, C., & Stecki, J.

(2010). The Use of Prognostic Health Management for

Autonomous Unmanned Air Systems, Proceeding of

International Conference on Prognostics and Health

Management, October 10-16, Portland, Oregon, USA

Kurtoglu, T., Johnson, S. B., Barszcz, E., Johnson, J.R.,

Robinson, P. I. (2008). Integrating System Health Man-

agement into the Early Design of Aerospace Systems

Using Functional Fault Analysis. Proceeding of Inter-

national Conference on Prognostics and Health Man-

agement, Oct 6-9, Denver, Colorado, USA.

Rudov-Clark, S. J., & Stecki, J. (2009). The language of

FMEA: on the effective use and reuse of FMEA data.

Sixth DSTO International Conference on Health & Us-

age Monitoring. March 9-12, Melbourne, Australia.

Rudov-Clark, S. D. , Ryan, A. J. ,Stecki, C. M. , Stecki, J.

S. (2009). Automated design and optimisation of sensor

sets for Condition-Based Monitoring. Sixth DSTO In-

ternational Conference on Health & Usage Monitoring.

March 9-12, Melbourne, Australia.

Scheuren, W. J., Caldwell, K. A., Goodman, G. A. and

Wegman, A. K. (1998). Joint Strike Fighter Prognostics

and Health Management. Proceedings of the 34th

AIAA/ASME/SAE/ASEE Joint Propulsion Conference

and Exhibit. July 13-15 1998, Arlington


41

Characterization of prognosis methods: an industrial approachJayant Sen Gupta1, Christian Trinquier2, Ariane Lorton3, and Vincent Feuillard4

1,3 EADS Innovation Works, Toulouse, [email protected]@eads.net

2,4 EADS Innovation Works, Suresnes, [email protected]

[email protected]

ABSTRACT

This article presents prognosis implementation from an in-dustrial perspective. From the description of a use-case(available information, data, expertise, objective, expectedperformance indicators, etc.), an engineer should be able toselect easily, among the large variety of prognosis methods,the ones that are compatible with his objectives and means.Many classifications of prognosis methods have already beenpublished but they focus more on the techniques that are in-volved (physical model, statistical model, data-based model,...) than on the necessary inputs to build/learn the modeland/or run it and the expected outputs.

This paper presents the different strategies of maintenanceand the place of prognostics in these strategies. The life cycleof a prognosis function is described, which helps to definerelevant, yet certainly not complete, characteristics of prog-nosis problems and methods. Depending on the maintenancestrategy, the prognosis function will not be used at the samestep and with different objectives. Two different steps of useare defined when using the prognosis function: evaluation ofthe current state and prediction of the prognosis output.

This paper gives also some elements of classification that willhelp an engineer choose the appropriate class of methods touse to solve a prognosis problem.

The paper also illustrates on one example the fact that, de-pending on the information at hand, the prognosis methodchosen is different.

1. INTRODUCTION

Condition-Based Maintenance (CBM) and Predictive Main-tenance seems to be attractive for the civil aeronautical in-

Jayant Sen Gupta et.al. This is an open-access article distributed under theterms of the Creative Commons Attribution 3.0 United States License, whichpermits unrestricted use, distribution, and reproduction in any medium, pro-vided the original author and source are credited.

dustry which bases its maintenance strategies mainly on Pre-determined Maintenance (see (ISO 13306, 2010) for defini-tions). The possible outcomes of CBM in comparison to theexisting maintenance strategies are:

1. increasing of the availability• avoid Operational Interruptions (OI)thanks to early

detection capabilities;• reduce maintenance times by a better scheduling

with less (or no) unscheduled maintenances;2. reduction of Direct Maintenance Costs (DMC)

• optimization of the use of each component, replac-ing it when it has reached almost all its full poten-tial;

• better control on the maintenance scheduling: air-craft (A/C) at the right place, at the right momentwith associated resources to conduct the mainte-nance actions

Of course, all these potential benefits must have the samelevel of safety or with a better level if possible.

In this context, the implementation of a prognosis functionon a component or system becomes a subject of high interestfor an engineer, as an important brick to build a better main-tenance strategy. The implementation process of the mainte-nance strategy is composed of two main phases: the set-up ofthe maintenance strategy (choice of the maintenance strategyand associated parameters) and the application of this main-tenance strategy on the component or system of an A/C. Thequestion of when the implementation of the prognosis func-tion is done is not as simple as it seems and we will show in afirst part the link between maintenance strategy and prognosisimplementation.

The main question, from the engineer point of view, remainsthe choice of an approach to implement the prognosis func-tion of a component or system.

1


42


Literature gives a very large variety of methods, usingvery different techniques from knowledge-based to physicaldegradation models ((Vachtsevanos, Lewis, Roemer, Hess, &Wu, 2006), (Jardine, Lin, & Banjevic, 2006), (Schwabacher& Goebel, 2007) or (Sikorska, Hodkiewicz, & Ma, 2011)).But practice does not show that a single method will be theoptimal solution for all systems/components in an A/C. Thequestion becomes, for the industrial, how to choose the ”best”approach to solve one prognosis problem?

To answer this question, the engineer designing the prognosisfunction has mainly two elements:

1. the available information (knowledge, expertise, data,...);Among this information, the knowledge on failure modeand associated degradation modes is essential, yet not al-ways available;

2. the expected performance of the prognosis (prognosishorizon, precision, maintenance cost reduction...); it isrelated to the use of the prognosis output (dispatch, main-tenance optimization, spare management, ...);

On the method part, the type, quantity and quality of the in-formation required to build/identify/learn the model is notalways clearly defined and most of the time assumed to beavailable both in quantity and quality. It is quite the samewith the observations, the data measured on the componentor system, required for the on-line stage. Depending on theinputs, a certain level of performance (prognosis horizon, pre-cision, access to a confidence in the results,...) can be definedfor each method.

Our aim, which goes far beyond this article, is thus to de-scribe classes of methods proposed in the literature with thepoint of view of the design engineer in order to help him un-derstand which methods are usable with the available infor-mation and performance objectives and when to use them inthe prognosis life cycle. As most classification attempts weremade with another goal, we expect to get a slightly differentresult. Sikorska et al. (2011) and Vachtsevanos et al. (2006)are the sources that are the closest to what is expected but themain driver of their classification remains the mathematicaltechniques used by the method.

This paper is divided into four parts. First, the different main-tenance strategies are briefly presented in relation with themodelling assumptions that are hidden behind them. Theplace of the prognosis function in the life cycle of the main-tenance is also discussed. Then, a first draft of classificationof prognosis methods is proposed which aim is to ease thechoice of the design engineer depending on the available in-formation. Then, a simple functional description of the prog-nosis implementation is done. Each method is to be describedin that context, stating how it is built, used and updated within-service data. Finally, on a the same component, a valve,three configurations are described:

• one with only reliability type information;• one with access to a physical model and measures of dif-

ferent stresses;• one with access to measures of a performance indicator

of the valve.

The aim of these three examples is to roughly show how theavailable information and performance objectives drive thechoice of possible methods. Needless to say that this paper isonly a first step towards a more general approach.

2. PROGNOSIS USED IN DIFFERENT MAINTENANCEPHASES

This section explains when prognosis is used for differentmaintenance strategies. Moreover, it highlights the modelingassumptions of the system for these strategies.

2.1. Prognosis usage depends on maintenance types

The different maintenance types are defined in the ISO norm13306 on maintenance terminology (ISO 13306, 2010).

However, in aeronautical context, the Maintenance ReviewBoard (MRB) process, supported by the Maintenance Steer-ing Group-3 (MSG-3) methodology, provides the referencemaintenance overview.

Two maintenance types are mainly used in aeronautical in-dustry. The first one is corrective maintenance: mainte-nance is done or scheduled once an item failure has been de-tected. The second one is predetermined (or planned) main-tenance. Maintenance tasks are planned during design (even-tually adapted during in operations). The maintenance tasksand intervals are defined using MSG-3 methodology.

Predetermined maintenance is non specific, i.e. it is adaptedto a population of items, making decisions based on statisticalconcerns and do not take into account the specific use of eachitem.

When relevant, a more specific maintenance type, calledCondition-Based Maintenance (CBM), is introduced in thenorm ISO 13306 (2010). Although it is not considered in it,we propose to consider two kinds of CBM :

• one based on the current-state of the item, called current-state CBM,

• one based on some specific forecast on the item, calledpredictive maintenance.

This addition to the norm is described in figure 1.

The maintenance decision in current-state CBM is based onthe estimation of the current state of the item (degradation in-dicator for instance), the current state being assessed to be ina maintenance region (scalar threshold or more complex forstate vector). This threshold is defined during design, tak-ing into account characteristics of the maintenance (time to

2


43


Figure 1. Different strategies of maintenance in industry

detect, plan and operate maintenance), future conditions andthe prognosis function. On the other hand, predictive main-tenance decision requires the computation of a future char-acteristic of the item at a certain time horizon, using futureconditions that could be specific for the item under study.

All preventive maintenance strategies require the predictionof a remaining time before failure and thus require a progno-sis function.

2.2. Prognostics for maintenance

The main concept used in prognosis is the Remaining Use-ful Life (RUL), which is the remaining time before a fail-ure occurs, also denoted estimated time to failure (see(Vachtsevanos et al., 2006) or (ISO 13381, 2004)). Prognosisis often defined as the estimation of the RUL (see (Sikorska etal., 2011) for an overview of prognosis definitions in the liter-ature), or more generally of a quantity of interest based on theRUL. Because of the multiple uncertainty sources (unknowndegradation process, future conditions, etc), RUL is funda-mentally a random variable. As this concept is not easily us-able to make decisions, the output of the prognosis should bea quantity based on this random variable:

• the estimation of the mean of the RUL with confidencebounds;

• the estimation of operational reliability at a given timehorizon;

• a quantile of RUL for a given risk (the RUL value forwhich the probability to over this value is equal to therisk);

• the probability density function of the RUL...

A maintenance decision in current-state CBM is made bycomparing an output with some thresholds, defined tak-ing into account maintenance constraints, knowledge on thedegradation, risk analysis and/or cost criteria.

In predictive maintenance, the output is the prognosis output(quantity of interest based on the RUL), computed using fu-ture assumption on the item. A prognosis function is thusrequired for the on-line phase.

In current-state CBM, the degradation indicator is the esti-mation of the current state of the item. The maintenancethresholds on the state of the item are determined during theoff-line stage by aggregating all the possible futures and con-sequences of such a state. A prognosis result is needed inthe design of the maintenance strategy to set the maintenancethresholds.

This argument is also true for predetermined maintenance.The maintenance tasks are scheduled according to risk andcost criteria which require a prognosis function during thedesign of the maintenance strategy. The prognosis function isnot required for the on-line stage.

Eventually, a prognosis is required for every preventive main-tenance. However, the prognosis is not used at the samephase. It is done on-line only for predictive maintenance, as ituses specific future assumption that cannot be pre-processed.This difference can be explained also by the different levelsof modeling behind each maintenance type.

2.3. Associated modeling assumptions

This section focuses on preventive maintenance, the associ-ated information used to build the different preventive strate-gies and the modeling assumptions that are done.

Concerning the modeling assumptions, they concern the eval-uation of the present state (pres. in table 1) and the predictionof the future (fut. in table 1). For each of these steps, theitem can be considered as unique (spec. in table 1) or part ofa population of similar items (glob. in table 1).

One can distinguish:

• predetermined maintenance: The associated models arebuilt using only information, knowledge and/or data ofsimilar items, called historical information. It can beprevious run-to-failure, experts or engineers knowledge,historical data, etc. For this maintenance strategy, no spe-cific evaluation of the current state is done and no specificprediction is made on the item. The item is consideredas one item among a population of similar items.

• current state CBM: Compared to the previous one, thismaintenance also requires a modeling of the specificpresent condition of the item. This is done using specificdata, which are online monitoring, inspections, built-intests directly made on the item. For this strategy, thepresent state of the item is estimated individually. Thesame component in another A/C would not have enduredthe same conditions and its present state would be dif-ferent. However, the future of the item is not studied

3


44


Maintenance type Data used Modeling

Predetermined

MaintenanceHistorical information

spec. glob.

Pres. xFut. x

Current State

Condition-Based

Maintenance

Historical information

Specific data

spec. glob.

Pres. xFut. x

Predictive

Maintenance

Historical information

Specific data

Future conditions

spec. glob.

Pres. xFut. x

Table 1. Different levels of modeling associated to mainte-nance types

specifically and a treatment has been done during designto select thresholds that account for all the possible fu-tures, missions, that the item might endure.

• predictive maintenance: This last maintenance implies amodeling of the specific future of the item, using specificfuture conditions. For this strategy, both present stateand future of the item are specific. The same item wouldhave different RUL if different future conditions wouldbe met.

This comparison is summarized in table 1.

3. FIRST ELEMENTS OF A CLASSIFICATION OF PROG-NOSIS METHODS FOR A DESIGN ENGINEER

The choice of a prognosis method is not an easy task. Eachmethod has its advantages and drawbacks and its performancedepends strongly on the quality of the inputs used. The avail-able information being different for each case, the best meth-ods will potentially be different for two different cases. Howcan a design engineer find its way through the large diversityof methods proposed in the literature?

The approach presented in this section is still in developmentand will continue to be refined in the future. The starting pointis the available information. Different situations are describeddepending on the level of insight on the degradation process.A class of methods that can be used are associated to eachsituation.

Figure 2 describes the different situations.

The different cases are detailed in the following. No methodsare detailed here but families of methods are given for eachcase.

Figure 2. First elements of classification

Case 1: no specific data In this situation, the design engi-neer has no access to specific data and works only with his-torical data, when available, and reliability studies. This con-straint makes it impossible to implement CBM. In this case,the methods that can be used are reliability based methods,with constant or variable failure rates.

Case 2: for a system, access to the fault state of the compo-nents The failure information of a component is useful onlyfor a system. When available, it allows to update the failurerate of the system (through the reliability diagram) and thusupdate the RUL of the system. In this case, the methods thatcan be used are conditional reliability based methods, withconstant or variable failure rates.

4


45


Case 3: no degradation indicator In this situation, spe-cific data is collected on the item but no degradation indicatorhas been identified. Prognosis requires to learn a model thatlinks the observables to the time of failure, for instance usinga database of history of observables and the associated timeof failure. In this case, the methods that can be used are data-based techniques to identify the features and learn the linkbetween the features and the time of failure.

Case 4: direct access to the indicator The building ofa degradation indicator requires a lot of knowledge of thedegradation process or, at least, of its consequences in termsof performance. The simplest situation is when the degrada-tion indicator is directly observable. In this case, the methodsthat can be used are methods to build the evolution of the in-dicator with future mission assumptions.

Case 5: indirect access to the indicator In this case, thedegradation indicator is not directly observable but has to becomputed from other specific data. Two models are to bebuilt and validated. The first model links the specific datawith the degradation indicator. This model can be built using,for instance:

• stress models based on the physics of degradation (envi-ronmental and operational conditions are monitored anda physics-based model computes the damage increment),

• a deviation from a nominal behaviour (both inputs andoutputs of the item are monitored and the deviation be-tween the monitored output and the nominal output com-puted from the monitored inputs is computed)...

For the prediction of the future of the indicator, two choicesare possible. The first is to use the values of the degradationindicator previously computed as can be done in case 4 withthe monitored degradation indicator. The second is to build amodel of the monitored parameters (with ARMA models forinstance) to simulate them in the future and use the first modelto compute the future values of the degradation indicator.

Each case needs to be described with much more details. Thenext section gives a way to describe the implementation of theprognosis function, that could be used to refine the descrip-tion of each of the previous cases.

4. PROCESS OF A PROGNOSIS FUNCTION IMPLEMENTA-TION

This section focuses on the description of the life cycle of aprognosis function implementation. As already mentioned,this implementation will be used during different phases (de-sign or in service). We will highlight in particular the type ofinformation used at each step. This description is dedicatedto a basic prognosis function where there is no fusion donebetween different prognosis functions implementations. This

is the case for components or for systems where the prog-nosis function is not modeled as a logical aggregation of theprognosis functions at component level.

We assume that the analysis of the component or system hasalready been done. Thus, we are in the situation where:

• the item is selected based on economical and risk criteria;• its failure modes are selected using safety analysis and

MSG-3 analysis (occurrence, criticality and cost crite-ria);

• associated degradation processes of the item are identi-fied;

• parameters to monitor in order to define the health statusof the item have been defined (called observables in thefollowing).

4.1. Phase 1: Design of the prognosis function implemen-tation

This phase corresponds to the design of the prognosis imple-mentation. In this step, the aim is to build models that rep-resent both the current state of the component or system andits evolution. It means choosing, developing and tuning themodels from the available information.

The only information that can be used at that stage is his-torical knowledge. This consists in domain expertise, histori-cal data (A/C, fleet), run-to-failures on test benches, feedbackfrom previous programs, etc.

The evaluation of the current state of the component or sys-tem can be direct or indirect. It is called direct if the currentstate is computed from the observables only by a data treat-ment, like filtering for instance. It is called indirect if it iscomputed through a model with observables as inputs. Thecharacterization of the current state could be as different asa scalar health indicator, a performance of a function or thecomplete history of the observables from the last maintenanceaction.

The evolution of the component/system can be either doneby:

• a state model: the evolution of the state of the item is thusresulting from an evolution of the observables, character-izing the future conditions undergone by the item;

• an incremental model: at each time step or cycle, an in-crement is computed and added to the current state.

During this phase, the different models are trained, selectedor identified. A way to validate them during the operations ofthe A/C has to be defined.

Another element that has to be defined during this step is amodel for the different mission conditions.

Finally, the Verification and Validation (V&V) process hasto be done. A first validation with historical data has to be

5


46


performed. The performance of the prognosis (see (Saxena,Celaya, Saha, Saha, & K., 2010) for examples of performanceindicators) has to be compatible with the usage of prognosisoutputs.

4.2. Phase 2: on-line execution

The on-line execution is the execution of the previous modelsduring the A/C usage.

4.2.1. Step A: Evaluation of the current state

The current state of the item can be defined as the minimal in-formation that characterizes the state of the item. It can takemany different forms, from the simple scalar health indica-tor, through a state vector that characterizes the state of thecomponent (includes internal variables of a physical modelfor instance), to the complete history of observables from thelast maintenance action (if no other knowledge is available).

The evaluation of the current state of the item is direct if mon-itored and indirect if a model is used to compute it from themonitored parameters.

4.2.2. Step B: Prediction of the prognosis result

This step consists in the computation of the quantity of inter-est based on the RUL (quantile of the RUL, reliability overa time interval, etc.) As already stated in the first step, themodeling of the future missions has to be introduced. Differ-ent cases are possible. The following gives some examples:

• the state of the item is computed by a model, building amodeling of the future inputs is a way to define futureconditions;

• the conditions in the future are the same as they werein the past, if the evolution of the current state of theitem is regular, the previous evaluations of current statein the past can be used to build a trend that can be post-processed to compute a prognosis result;

4.3. Phase 3: Update and V&V

4.3.1. Update of historical data

The first element of this step is the update of historical datadone by collecting the run-to-maintenance of each item andadding them to the historical data.

This update of historical data might lead to an update of thedifferent models that are used in the prognosis implementa-tion.

4.3.2. Validation all along the life cycle of the A/C

The different models used in the prognosis implementationhave been validated using test-bench results, historical data,scenarios of use that are a model of the reality the item willhave to face after EIS.

Right after EIS, the priority is to validate the implementationwith in-service data to measure the effect of the modelingerror of real conditions done in the first validation done in4.1.

All along the life cycle of the A/C, that could last forty to fiftyyears, the validation has to continue, maybe with a differenttime scale, to detect potential drift due to an evolution of theuse of the A/C.

This simple description of the process of implementationgives an idea on how the methods can be used, how they cancollaborate. Moreover, the same methods can be used at dif-ferent steps with different objectives.

5. EXAMPLE OF DIFFERENT PROGNOSIS FUNCTIONS ONTHE BLEED SYSTEM

This section aims at describing an industrial prognostics case,and at illustrating the process described on section 4. Threeprognostics cases will be considered. In each case, the com-ponent under study and the expected prognosis output remainidentical, but the available inputs are different and the prog-nosis performance will be different. Thus, different prognosismethods must be implemented, and the prognosis expectedperformance may not be reachable. As this paper focuseson the prognosis process definition and its characteristics, theprognosis results are not provided here. Moreover, the vali-dation phase is not described in the following.

5.1. Description of the initial example

The component under study here is a pneumatic valve withinthe Bleed air system. This system is part of ATA-36. It pro-vides air to the cabin at an admissible pressure. Basically, ittakes air at high pressure from the engines or auxiliary powerunit (APU), then regulates its pressure and provides this reg-ulated air to the rest of the bleed system. Figure 3 illustratesa Bleed system on a CFM56-5B.

The component under study participates in the pressure reg-ulation. During this process, the air pressure needs to bereduced, which is done by the Pressure Regulating Valve(PRV). There are different kinds of PRV, and we consider herea pneumatic valve (see figure 4). This particular example waspreviously studied in (Daigle & Goebel, 2010).

Due to different kinds of constraints, a performance objectiveis set. For instance, the prognosis horizon must be at least fivehundreds flight-hours.

5.2. At system level: reliability type information

In this example, the bleed system is represented in a very sim-plified way as a set of valves, one per engine, and a com-ponent representing the pipes. In this case, the informationavailable is the constant failure rates of each component of

6


47


Figure 3. Scheme of bleed system on CFM-56B

Figure 4. Scheme of the pneumatic valve, from (Daigle &Goebel, 2010)

the system.

The only online information is the fault status of each com-ponent. This case corresponds to Case 2 in figure 2.

The best use of the available information to compute a quan-tity of interest based on the RUL is to use the same model asin classic reliability where the bleed system can be consideredin its logical view, as shown in Figure 5.

The improvement that is done is to take into account the cur-rent state of each component, here the fault status. The dif-

Figure 5. (Very) Simplified view of a bleed system

ference between failure rates when one PRV valve is in faultis due to the change of operational condition, the remainingvalves being overstressed to maintain the bleed performance.

Using a Pure Jump Markovian Process, the RUL conditionalto the current state of the system can be computed as wellas all quantities based on the RUL. Despite the fact that thevariance of the RUL will be smaller than the RUL that wouldbe computed without any information, the added informationis rather poor and the added value may not be sufficient tomeet the objectives of the prognosis function.

Concerning the update phase, the constant failure rates ofeach component could be updated using the real constant fail-ure rates rebuilt from the in-service data.

5.3. At component level: using physics based model

For this case, physical knowledge of the degradation behav-ior of the valve is available, along with experiments to iden-tify and validate the parameters of the physical model. Thescheme of the valve on which the model is built is shown inFigure 4 and is taken from (Daigle & Goebel, 2010).

The only monitored parameter is the pneumatic pressurecommand.

The current state evaluation is done by incrementing the phys-ical degradation caused by the variation of the pneumaticpressure command. This corresponds to Case 5 in figure 2.

The computation of the quantity of interest based on the RULis done by computing the future state of the valve and post-process it to compute the RUL. This can be done at least intwo different ways:

• Model the future conditions that will undergo the valveby modeling the future pneumatic pressure command.Use this command as input of the physical model, ini-tialized by the current state, compute future states of thevalve,

• Assume that future conditions will be the same as previ-ous conditions and make a statistical model of the degra-

7


48


dation indicator using the past values of the degradationindicator, for instance using a linear regression modelover use time, or cycles. Use this model to compute fu-ture states of the model.

The future degradation state is then post-processed to com-pute the quantity of interest based on the RUL.

For the second way of the prediction step, the update phasecould be done by capitalizing the models built by the linearregression and study whether they are always the same or arevery different from a component to another or from a missionto another. The history of degradation indicator for one com-ponent could also be added to the historical knowledge as arun-to-maintenance test.

5.4. At component level: using a performance indicator

For this case, the available information is that the degradationof the valve can be characterized by the time of opening andclosing of the valve. Historical knowledge shows also thatthis degradation is relatively smooth and progressive. Thevalve is considered useful as long as the opening and closingtime are smaller than a threshold value.

The available online information consists in the measure ofthe position of the valve from which one can derive the open-ing and closing time.

The current state of the valve is characterized by the historyof closing and opening time monitored since the last replace-ment of the valve.

For the prediction step, the opening and closing time data isused to build a data model, a regression model for instance,which is used to predict future performance of the valve. Theprediction of performance is then post-processed to computethe quantity of interest based on the RUL.

As in the previous case, the update phase consists in the cap-italization of runs-to-maintenance once the component is re-placed and a capitalization of the different models built withthe .

6. CONCLUSION

In this paper, the implementation of prognostics has been pre-sented from a design engineer point of view. The questionsto be addressed are:

• What information is available?

• What method or set of methods can be used to computethe prognosis output?

• If the prognosis built does not reach the expected perfor-mance, what information should be added to reach theexpected performance with the same method or with adifferent one?

In the literature, the classifications of prognosis methods aremostly driven by the mathematical techniques used. In thispaper, a simple classification is presented. This classificationis based on the available knowledge (historical knowledge,expertise, run-to-failures, already existing online monitoring,future mission profiles, etc.) and defines different situations.This classification has been illustrated by describing differentways to build a prognosis on a bleed valve and relating eachexample to one of the situation previously described.

Methods have been associated to each of these situations butthis work will continue in the future. The process of progno-sis implementation is the way proposed to describe more indetails the use the methods in the different cases. It shouldhighlight:

• the type of information and data needed to build the dif-ferent models used by each method, both for the eval-uation of the current state and for the prediction of theRUL;

• the verification and validation process both during designand after the EIS.

A lot of work is still to be done.

NOMENCLATURE

RUL Remaining Useful LifeCBM Condition Based MaintenancePRV Pressure Regulating ValveDMC Direct Maintenance CostV&V Verification and ValidationEIS Entry Into ServiceAPU Auxiliary Power Unit

REFERENCES

Daigle, M., & Goebel, K. (2010). Model-based prognosticsunder limited sensing. In Aerospace Conference, 2010IEEE (pp. 1–12).

ISO 13306. (2010). Maintenance Terminology (Tech. Rep.No. EN 13306:2010). International Organization forStandardization.

ISO 13381. (2004). Condition Monitoring and Diagnos-tics of Machines, Prognostics part 1: General Guide-lines (Vol. ISO/IEC Directives Part 2; Tech. Rep. No.ISO13381-1). International Organization for Standard-ization.

Jardine, A., Lin, D., & Banjevic, D. (2006). A review onmachinery diagnostics and prognostics implementingcondition-based maintenance. Mechanical Systems ansSignal Processing, 1483-1510.

Saxena, A., Celaya, J., Saha, B., Saha, S., & K., G. (2010).Metrics for Offline Evaluation of Prognostics Per-formance. International Journal of Prognostics andHealth Management (IJPHM), 1(1).

Schwabacher, M., & Goebel, K. (2007). A Survey of Ar-

8


49


tificial Intelligence for Prognostics. In Proceedings ofAAAI Fall Symposium.

Sikorska, J., Hodkiewicz, M., & Ma, L. (2011, July). Prog-nostic modelling options for remaining useful life es-timation by industry. Mechanical Systems and Signal

Processing, 25(5), 1803-1836.Vachtsevanos, G., Lewis, F. L., Roemer, M., Hess, A., & Wu,

B. (2006). Intelligent Fault Diagnosis and Prognosisfor Engineering Systems. In 1st ed. Hoboken.

9


50

Damage identification and external effects removalfor roller bearing diagnostics

Pirra M.1, Fasana A.2, Garibaldi L.3, and Marchesiello S.4

1,2,3,4 Dynamics & Identification Research Group, DIMEAS, Politecnico di Torino,Corso Duca degli Abruzzi 24, 10129 Torino, Italy

[email protected]@polito.it

ABSTRACT

In this paper we introduce a method to identify if a bearing isdamaged by removing the effects of speed and load. In fact,such conditions influence vibration data during acquisitionsin rotating machinery and may lead to biased results when di-agnostic techniques are applied. This method combines Em-pirical Mode Decomposition (EMD) and Support Vector Ma-chine classification method. The vibration signal acquired isdecomposed into a finite number of Intrinsic Mode Functions(IMFs) and their energy is evaluated. These features are thenused to train a particular type of SVM, namely One-ClassSupport Vector Machine (OCSVM), where only one class ofdata is known. Data acquisition is done both for a healthybearing and for one whose rolling element presents a 450 µmdamage. We consider three speeds and three different radialloads for both bearings, so nine conditions are acquired foreach type of bearing overall. Feature evaluation is done usingEMD and then healthy data belonging to the various condi-tions are taken into account to train the OCSVM. The remain-ing data are analysed by the classifier as test object. The realclass each element belongs to is known, so the efficiency ofthe method can be measured by counting the errors made bythe labelling procedure. These evaluations are performed byapplying different kinds of SVM kernel.

1. INTRODUCTION

Rolling bearings are among the most widely used compo-nents in machinery. Their condition monitoring and fault di-agnosis are then very important in order to prevent the oc-currence of breakdowns. A wide range of different methodshas been proposed since the Seventies to get proper fault di-agnosis techniques. Signal analysis is an important topic inmechanical fault diagnosis research and applications thanksto its ability to extract the fault features and identify the fault

Pirra M. et.al. This is an open-access article distributed under the terms ofthe Creative Commons Attribution 3.0 United States License, which permitsunrestricted use, distribution, and reproduction in any medium, provided theoriginal author and source are credited.

patterns. Methods such as Fourier analysis and time-domainanalysis take into account the acquired signal and are basedon the assumption that the process generating the signal it-self is stationary and linear. Unluckily, the faults are timelocalised transient events, so this kind of techniques couldprovide a wrong information.Some possible ways to overcame these aspects are presentedin Randall and Antoni (2011). They develop an interestingreview of diagnostic analysis of acceleration signals fromrolling element bearings, especially when a strong mask-ing noise is present due to other machine components suchas gears. They show industrial applications that confirmthe reliability of their methods. Another interesting methodthat could be efficiently used in the vibration-based condi-tion monitoring of rotating machines is presented in Antoni(2006). He shows how the Spectral Kurtosis (SK), in contrastto classical kurtosis analysis, provides a robust way of detect-ing incipient faults even in the presence of strong maskingnoise. The other appealing aspect is that it allows to designoptimal filters efficiently to filter out the mechanical signatureof faults.A useful tool to analyse non-stationary signals such as thoserelated to bearing vibrations is wavelet transform. Its strengthcomes from the simultaneous interpretation of the signal inboth time and frequency domain that allows local, transientor intermittent components to be exposed. As drawback thereis the dependence on the choice of the wavelet basis func-tion. An example of wavelet-based analysis technique for thediagnosis of faults in rotating machinery from its vibratingsignature is Chebil, Noel, Mesbah, and Deriche (2009).An innovative technique in the time–frequency domain is theEmpirical Mode Decomposition (EMD) (Huang et al., 1998).It allows any complicated signal to be decomposed into a col-lection of Intrinsic Mode Functions (IMFs) based on the lo-cal characteristic time scale of the signal. It is self-adaptivebecause the IMFs, working as the basis functions, are deter-mined by the signal itself rather than being pre-determined.Hence, EMD is highly efficient in non-stationary data analy-

1


51


sis. It has been applied to a wide variety of problems, goingfrom geophysics to structural health monitoring (Huang &Shen, 2005). Lots of authors apply EMD to rotating machinesand bearings with diagnostic intents, usually in associationwith other techniques. Some examples are Gao, Duan, Fan,and Meng (2008), where combined mode functions are intro-duced, Junsheng, Deije, and Yu (2006), that use EMD jointlywith an AutoRegressive model and Yu, Deije, and Junsheng(2006), that train an Artificial Neural Network (ANN) classi-fier with the EMD energy entropies.Another worth of interest aspect is the search for methodsable to remove effects produced in vibrations by external fac-tors, such as environmental temperature or test rig assem-blies. Some examples are presented in Pirra, Gandino, Torri,Garibaldi, and Machorro-Lopez (2011) and in Machorro-Lopez, Bellino, Garibaldi, and Adams (2011), where themulti-variate statistical technique named Principal Compo-nent Analysis (PCA) is used successfully in bearing fault de-tection and rotating shaft. Other factors influencing vibrationsrelated to rotating elements are varying load and speed. Infact a variation in these factors produces some difficulties inrecognising the presence of fault in a signal. Bartelmus andZimroz (2009) show how in condition monitoring of plane-tary gearboxes is important to identify the external varyingload condition. In particular, they analyse in detail how manyfactors influence the vibration signals generated by a systemin which a planetary gearbox is included and show how theload has a consistent contribution. As far as bearings are con-cerned, instead, some works are presented in Cocconcelli,Rubini, Zimroz, and Bartelmus (2011) and Cocconcelli andRubini (2011). They inspect the continuous change of rota-tional speed of the motor, that represent a substantial draw-back in terms of diagnostics of the ball bearing. In fact,the large part of algorithms proposed in the literature needsa constant rotation frequency of the motor to identify faultfrequencies in the spectrum. They tackle the problem withencouraging results aided by ANN and Support Vector Ma-chine (SVM).These two last techniques could be grouped under the termsof soft or natural computing. They are well developed inWorden, Staszewski, and Hensman (2011), an exhaustive tu-torial overview of their basic theory and their applications inthe context of mechanical systems research. SVM in particu-lar, is widely used for condition monitoring and damage clas-sification (Widodo & Yang, 2006), (Rojas & Nandi, 2006). Itis based on the concept of separating data objects into differ-ent classes through an hyperplane. However, this method as-sumes that all types of instances are known before applying it.A particular case of SVM is the One-Class SVM (OCSVM),that is well suited for a diagnostic technique purpose. In fact,it allows the creation of the separating hyperplane startingfrom the knowledge of only one class, that is what usuallyhappens in damage detection. Shin, Eom, and Kim (2005)adopt this method for machine fault detection and classifi-

Figure 1. Acceleration signal and its decomposition for ahealthy bearing (left) and for a faulty rolling element one(right).

cation in electro-mechanical machinery from vibration mea-surements.The intent of our work is to find a parameter able to removethe influence of various external conditions in order to detectproperly a damage in a roller bearing. This paper is organisedas follows. In next two sections EMD method and OCSVMare presented with some theoretical background. Our algo-rithm is explained in Section 4 and then its application on atest rig is developed in the following session.

2. EMPIRICAL MODE DECOMPOSITION

Empirical Mode Decomposition is a method presented byHuang et al. (1998) and based on the local characteristic timescales of a signal. This approach could be seen as a self-adaptive signal processing method that can be applied to non-linear and non-stationary process. In particular, it allows acomplex signal function to be decomposed into a number ofintrinsic mode functions (IMFs). Each one of these compo-nents contains frequencies changing with the signal itself andit has to satisfy the following definition:

• In the entire data set, the number of extrema and the num-ber of zero crossings must either be equal or differ atmost by one.

• At any point, the mean value of the envelope defined bythe local maxima and the envelope defined by the localminima is zero.

Thanks to this definition, each IMF represents the simple os-cillation mode involved in the signal. According to Huang etal. (1998) a sifting process is used in order to extract the IMFsfrom a given signal x(t). It consists of different steps:

2


52


1. Identify all the extrema of the signal, and connect all thelocal maxima by a cubic spline line as the upper enve-lope. Repeat the same procedure on the local minima toproduce the lower envelope.

2. Designate the mean of the two envelopes as m1, and thedifference between the signals x(t) and m1 as the firstcomponent, h1, i.e.

x(t)−m1 = h1. (1)

Ideally, if h1 is an IMF, then take it as the first IMF com-ponent of x(t). Otherwise, consider h1 as the originalsignal and repeat the first two step obtaining

h1 −m11 = h11. (2)

Repeat the sifting process up to k times when h1k be-comes an IMF, that is

h1(k−1) −m1k = h1k. (3)

The first IMF component is then designated as

c1 = h1k. (4)

3. Separate c1 from the original signal x(t) to obtain theresidue r1:

r1 = x(t)− c1. (5)

4. Consider r1 as the original signal and repeat the aboveprocess n times, obtaining the other IMFs c2, c3, . . . , cnsatisfying

r1 − c2 = r2...

rn−1 − cn = rn

(6)

5. Stop the decomposition process when rn becomes amonotonic function from which no more IMFs can beextracted. The sum of Eq. (5) and Eq. (6) gives

x(t) =

n∑

i=1

ci + rn. (7)

From Eq. (7) we can see how the signal x(t) can be decom-posed into n empirical modes and a residue rn, that couldbe interpreted as the mean trend of the signal. Each IMF ciincludes different frequency bands ranging from high to lowand is stationary.Figure 1 shows two signals, a healthy and a damaged one.The last one refers to a 450 µm fault on a rolling element. Inboth cases, the original signal and 3-IMFs decomposition ofthe signal itself are presented.

3. ONE-CLASS SUPPORT VECTOR MACHINE

Support vector machine (SVM) is a computational learningmethod developed during the 80s, based on the statisticallearning theory (Vapnik, 1982). It is well suited for classi-fication, because given some data points which belong to acertain class it is able to state the class a new data point would

Figure 2. One-Class SVM classifier where the origin is theonly member of one class.

be in. If we consider an n-dimensional input data made up ofa number of samples belonging to a class, namely positiveor negative, SVM constructs a hyperplane that separates thetwo classes. Moreover, this boundary would satisfy the con-dition that the distance from the nearest data points in eachclass is maximal. In this way, an optimal separating hyper-plane is created, namely the maximum margin. The points inboth classes nearest to this margin are called support vectorsand, once selected, they contain all the information necessaryto define the classifier. Every time a new element appears, itcould be classified according to where it places respect to theseparating hyperplane.SVM could also be applied in case of non-linear classifica-tion using a function φ(x) that maps the data onto a high-dimensional feature space, where the linear classification isthen possible. Furthermore, if a kernel function K(xi, xj) =(φT (xi) · φ(xi)) is applied, it is not necessary to evaluateexplicitly φ(xi) in the feature space. Various kernel func-tion could be used, such as linear, polynomial or GaussianRBF. This property enables SVM to be used in case of verylarge feature spaces because the dimension of classified vec-tors does not influence directly the SVM performance.When more than two classes are present, a Multi-class SVMcould be adopted. Two different approaches are taken into ac-count: One-against-all (OAA) and One-against-one (OAO).In the first one the i-th SVM is trained with all the examplesin the j-th class with positive labels and all the other exampleswith negative labels, while in the latter one each classifier istrained on data from two classes.It is clear that in the previous cases, two or more classes ofdata are given since the beginning of the analysis. In moregeneral diagnostic applications, instead, only one type of dataobjects is usually acquired: the healthy one. This could beseen as the detection of patterns in data that do not conform to

3


53


a well defined notion of normal behaviour, so we could referto anomaly detection. One-Class SVM is the application ofthe SVM approach to the general concept of anomaly detec-tion, as presented by Schlkopf et al. in Schlkopf, Williamson,Smola, Taylor, and Platt (2000). In their method they con-struct a hyper-plane around the data, such that this is max-imally distant from the origin and can separate the regionsthat contain no data. They propose to use a binary functionthat returns +1 in region containing the data and -1 elsewhere.For a hyper-plane w which separates the data xi from the ori-gin with maximal margin ρ, the following quadratic programhas to be solved:

minw∈F,ξ∈Rn,ρ∈R

1

2||w||2 +

1

νn

∑

i

ξi − ρ (8)

subject to (w · Φ(xi)) ≥ ρ− ξi, ξi ≥ 0 (9)

where ξ represents the slack variable and ν is a variable takingvalues between 0 and 1 that monitors the effect of outliers(hardness and softness of the boundary around data).If w and ρ solve the minimisation problem presented in Eq.(8) - (9), the decision function

f(x) = sign((w · Φ(xi))− ρ) (10)

is positive for most instances representing the majority ofdata.Figure 2 shows graphically the idea presented here, with onlyfew points around the origin that are negatively labelled.

4. METHODOLOGY

The previous sections introduced the background and the the-oretical aspects of the two methods that now we want to usejointly. The goal of this study is the search for a method ableto identify a damage in a rotating element of a roller bearingby removing the effect of external conditions influencing vi-brations.The diagnosis method consists of different steps:

1. Collect vibration signals under various condition ofspeed and radial load applied, both for a healthy and adamaged bearing.

2. Apply EMD and decompose the original signal into someIMFs; then choose the first n to extract the features usedduring the analysis.

3. Evaluate the total energy for the n selected IMFs:

Ej =

∫ +∞

−∞|cj(t)|2dt j = 1, . . . , n. (11)

4. Create a feature vector with the energies of the n selectedIMFs:

F = [E1, . . . , En]. (12)5. Normalise the feature vector dividing F for this value:

EN =

√√√√n∑

j=1

|Ej |2 (13)

(a)

(b)

Figure 3. DIRG test rig (a) and roller bearing used during thetests with the damaged roller in the white circle (b).

6. Obtain the n-dimension normalised feature vector:

F ′ = [E1/EN , . . . , En/EN ]. (14)

7. Consider 75% of healthy data as training and the remain-ing 25% as test together with damaged data. All loadsand speeds are analysed together.

8. Train the one-class SVM classifier on training data andevaluate the label assigned by the classifier to test data.The real class is known so mistakes in labelling could becomputed.

9. Repeat point 7. and 8. 30 times permuting healthy dataorder to give statistical significance to the analysis andevaluate the error percentage in labels assignment.

5. APPLICATION TO BEARING DATA

Several conditions can influence data during acquisitions inour test rig analysis: speed, external load, temperature vari-ations. Detecting and removing the effects of these factorsis important to avoid any bias during the application of diag-nostic techniques. In fact, a small variation in speed or in thetemperature of the oil circulating in a system produces devi-ations that a diagnostic algorithm may erroneously detect asa damage, thus providing a false alarm. In this paper we tryto introduce a method able to identify a damage in a rotatingelement of a roller bearing by removing the effect of speedand external load.Accelerations are acquired on a test rig assembled by Dy-namics & Identification Research Group (DIRG) at Depart-ment of Mechanical and Aerospace Engineering (Figure 3 a).

4


54


Figure 4. RMS value.

This bearing test rig is designed to perform accurate testingof bearings different levels of damage in a controlled labo-ratory conditions, especially regarding the minimization ofspurious signals coming from the mechanical sounds of otherbearings, rotating shafts, gear wheels meshing and other vi-brating elements. Hence, we are sure that the only variationsin accelerations are given by speed and load that can be prop-erly changed and monitored.We consider three different speed values (9000, 10500 and12000 RPM) and three radial loads (1.4, 1.6 and 1.8× 103 N)and we acquire data for each combination. In particular, 10acquisitions registering 1 second of vibrating signal at sam-pling frequency 102.4 kHz are collected for each of the ninecases. This is done both for a healthy bearing and for a dam-aged one. In the last case, we analyse a bearing with a greaterthan 450 µm fault on a rolling element (Figure 3 b). Noticethat the temperature of the oil circulating is almost constantbetween the different acquisitions, so we are certain that theonly variations detected through vibrations are caused by loadand speed changing.In Figure 4 Root Mean Square values for the 10 acquisitionsin each condition are evaluated. This plot shows how this pa-rameter is influenced by the speed both for healthy and dam-aged case and it increases with higher speeds. Moreover, itcan be noticed that in low speed cases this parameter valuefor damaged bearing is almost near to the healthy one whenit reaches the highest speed. For example, RMS value for adamaged bearing at 9000 RPM for the three loads is around30. If we consider the healthy case at the highest speed eval-uated (12000 RPM) RMS is around 34, so it can be noticedthat the undamaged bearing at higher speed has a parametervalue greater than the faulty one at lower speed. It means thatif we consider the RMS parameter taking into account all nineconditions together, the difference between healty and faulty

Figure 5. Error percentage for linear kernel.

Figure 6. Error percentage for polynomial kernel.

Figure 7. Error percentage for gaussian kernel.

5


55


Figure 8. 2-dimension feature vector F ′ representation.

bearings may be strongly biased. This observation leads tothe need of a parameter that could avoid such kind of prob-lems.According to the methodology presented in Section 4, we ob-tain a normalised feature vector F ′. We decide to take into ac-count the first 8 IMFs which include the most dominant faultinformation, so this vector is in a 8-dimensions space. Theanalysis through OCSVM is done starting from the first twodimensions of the feature vector. Then we add a new dimen-sion each time until the whole feature vector F ′ is used. Wechoose to include the feature from the beginning according tothe fact that EMD operates in form of collection of filters or-ganised in a filter bank structure. In particular, the first modecould be considered similar to a highpass filter while the othermodes are characterised by a set of overlapping bandpass fil-ters (Flandrin & Rilling, 2004). In such way, taking the fea-ture starting from the beginning of the vector, we move fromhigher frequency contents to lower ones.As stated in Section 4, the 75% of healthy data are used totrain the classifier, while the 25% of them are added to dam-aged data as testing instances. Since the exact belonging isknown, it is interesting to evaluate the errors in labelling madeby the OCSVM classifier. In this way, an evaluation of the re-lation between the number of dimensions and a proper iden-tification procedure could be done. Moreover, three differentSVM kernels are compared through the application to the ac-quired data:

• linear: K(xi, xj) = (xTi xj)d

• polynomial: K(xi, xj) = (xTi xj + 1)d

• Gaussian: K(xi, xj) = exp(−γ||xi − xj ||2)

For each kernel, parameters d and γ take values going from 1to 4 and labelling mistakes are evaluated in percentage. Fig-ures 5, 6 and 7 present the different behaviours of the three

Figure 9. 2-dimension feature vector F ′ representation afterOCSVM.

Figure 10. 2-dimension feature vector F ′ representation withdifferent conditions: the first number is the speed expressedin RPM, the second is the load expressed in kN.

kernels when the number of feature and the parameters val-ues increase. The error percentage for the linear kernel tendsto decrease when the dimensions go from 1 to 8. Hence, inorder to provide a good detection ability a greater number offeatures should be considered. The same behaviour is ob-served for polynomial kernel when d = 1, while for the othervalues of the parameter less errors are present for 2, 6, 7 and8 dimensions. The error trend in the case of a gaussian ker-nel does not seem to be conditioned by parameter γ, whilethe minimum number of labelling errors are found when thefeature vector has 2 and 7 dimensions. On the whole, a gaus-sian kernel or a polynomial one with parameter d > 1 givesuccessful results in detecting the damage regardless of speed

6


56


Figure 11. Normalised feature vector F ′ values for threespeeds for both undamaged and damaged case.

and load influence.To emphasise this fact, we can concentrate on the 2-dimensions feature vector F ′ thanks to the fact that it givesinteresting results and because it is easier to visualise. Ifwe consider the totality of 180 values computed using ourmethodology for both healthy and damaged bearing, we ob-tain the plot in Figure 8. In this picture, it is clear how datadivide into two groups according to their state rather then de-pending on their condition of load and speed. This explainsthe great efficiency of the classifier in damage identification,due to the perfect distinction between the two classes of data.It could be seen in Figure 9 how OCSVM with Gaussian ker-nel and γ = 1 works. The testing data are well classified(green triangles) and only one belonging to the faulty class islabelled as healthy producing an error (red cross).Furthermore, any dependence on different loads and speedsseems to be removed as pointed out in Figure 10. The ninesymbols represent the various conditions for the undamagedand damaged bearing and, on the whole, no particular divi-sion based on the rotational speed or on the load applied isnoticed.Figure 11 could help to explain the ability of the method inthe speed and load influence removal. Values of one acquis-tion feature vector F ′ are plotted for each of the speeds con-sidered, both for the healthy and for the damaged bearing.Firstly, the vector normalisation presented at step 5 and 6 inthe Methodology Section helps to remove the contribution ofhighest energies and, so, to mitigate the various conditionsinfluence on the features. Moreover, as it could be noticedin the Figure, this aspect is particular observable for the ’fre-quency content’ represented by c2. The normalised values ofthe energies here, in fact, tends to be very similar indepen-dently of the speed considered, giving a great contribution in

the removal of this parameter influence.

6. CONCLUSION

In this paper we proposed a method for the detection of dam-ages in roller bearings with the removal of speed and loaddependence. This methodology combines Empirical ModeDecomposition, used to produce a proper feature vector, withthe One-Class Support Vector Machine technique, exploitedto classify the data. Since the original class belonging wasknown, different SVM-kernels have been tested in order tofind those with lower error rate. Encouraging results havebeen obtained related to the ability of this feature in removingspeed and load dependence in order to avoid any bias in datainterpretation and identification. Further applications coulddeal with various damage entity comparisons and with otherdamage type, such as sandblasted inner ring. Moreover, otherfactors influence removal, such as temperature, and the com-parison of this method with other techniques used to obtainthe feature vector, such as wavelet decomposition, could bedeveloped.

REFERENCES

Antoni, J. (2006). The spectral kurtosis: a useful tool forcharacterising non-stationary signals. Mechanical Sys-tem and Signal Processing, 20, 282307.

Bartelmus, W., & Zimroz, R. (2009). Vibration conditionmonitoring of planetary gearbox under varying externalload. Mechanical Systems and Signal Processing, 23,246257.

Chebil, J., Noel, G., Mesbah, M., & Deriche, M. (2009).Wavelet decomposition for the detection and diagnosisof fault in rolling element bearings. Jordan Journal ofMechanical and Industrial Engineering, 3, 260-267.

Cocconcelli, M., & Rubini. (2011). Support Vector Machinesfor condition monitoring of bearings in a varying-speedmachinery. In Proceeding International Conference onCondition Monitoring, Cardiff, UK.

Cocconcelli, M., Rubini, R., Zimroz, R., & Bartelmus, W.(2011). Diagnostics of ball bearings in varying-speedmotors by means of Artificial Neural Networks. In Pro-ceeding International Conference on Condition Moni-toring, Cardiff, UK.

Flandrin, P., & Rilling, G. (2004). Empirical Mode Decom-position as a filter bank. IEEE Signal Processing Let-ters, 11(2), 112-114.

Gao, Q., Duan, C., Fan, H., & Meng, Q. (2008). Rotatingmachine fault diagnosis using empirical mode decom-position. Mechanical System and Signal Processing,22, 1072-1081.

Huang, N. E., & Shen, S. (Eds.). (2005). Hilbert-HuangTransform and Its Applications. World Scientific, Sin-gapore.

Huang, N. E., Shen, Z., Long, S. R., Wu, M. C., Shih, H. H.,

7


57


Zheng, Q., et al. (1998). The empirical mode decompo-sition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. In Proceedings of theRoyal Society (p. 903-995).

Junsheng, C., Deije, Y., & Yu, Y. (2006). A fault diagno-sis approach for roller bearings based on EMD methodanda AR model. Mechanical System and Signal Pro-cessing, 20, 350-362.

Machorro-Lopez, J., Bellino, A., Garibaldi, L., & Adams, D.(2011). PCA-based techniques for detecting crackedrotating shafts including the effects of temperature vari-ations. In Proceeding 6th International Conferenceon Acoustical and Vibratory Surveillance Methods andDiagnostic Techniques, Compigne, France.

Pirra, M., Gandino, E., Torri, A., Garibaldi, L., & Machorro-Lopez, J. M. (2011). PCA algorithm for detection,localisation and evolution of damages in gearbox bear-ings. Journal of Physics. Conference series, 305(1).

Randall, R. B., & Antoni, J. (2011). Rolling element bearingdiagnostics - A tutorial. Mechanical System and SignalProcessing, 25, 485-520.

Rojas, A., & Nandi, A. B. (2006). Practical scheme for fastdetection and classification of rolling-element bearingfaults using support vector machines. Mechanical Sys-

tems and Signal Processing, 20, 1523-1536.Schlkopf, B., Williamson, R. C., Smola, A. J., Taylor, J. S., &

Platt, J. C. (2000). Support vector method for noveltydetection. Advances in Neural Information ProcessingSystems, 12, 582-586.

Shin, H. J., Eom, D.-H., & Kim, S.-S. (2005). One-classsupport vector machines - an application in machinefault detection and classification. Computer & Indus-trial Engineering, 48, 395-408.

Vapnik, V. N. (Ed.). (1982). Estimation of dependences basedon empirical data. Springer-Verlag, New York.

Widodo, A., & Yang, B. (2006). Support vector machinein machine condition monitoring and fault diagnosis.Mechanical Systems and Signal Processing, 21, 2560-2574.

Worden, K., Staszewski, W. J., & Hensman, J. J. (2011).Natural computing for mechanical systems research: Atutorial overview. Mechanical Systems and Signal Pro-cessing, 25, 4-111.

Yu, Y., Deije, Y., & Junsheng, C. (2006). A roller bearingfault diagnosis method based on EMD energy entropyand ANN. Journal of Sound and Vibration, 294, 269-277.

8


58

Data Management Backbone for Embedded and PC-based Systems

Using OSA-CBM and OSA-EAI

Andreas Löhr1, Conor Haines

2, and Matthias Buderath

3

1,2Linova Software GmbH, Garching b. München, 85748, Germany

[email protected]

[email protected]

3 Cassidian, Manching, 85077, Germany

[email protected]

ABSTRACT

Cassidian is in the process of developing a comprehensive

simulation framework for integrated system health

monitoring and management research and development.

One significant building block is to invite 1st class

technology providers, e.g. Universities and SMIs, to provide

innovative technologies and support their integration into

the simulation framework. This paper is a joint presentation

of Cassidian and Linova Software GmbH, a Cassidian

preferred software provider.

Prognostic Health Management (PHM) systems are

commonly composed of disparate and distributed hard- and

software components. Further, these components exchange

vast amounts of data over a heterogeneous collection of

communication channels. Any such system’s success

depends upon an open, uniform, and performance-optimized

solution for data management. A solution that includes: data

definition, data communication, and data storage. The Open

System Architecture for Condition-based Maintenance

(OSA-CBM) and Open System Architecture for Enterprise

Application Integration (OSA-EAI) are complementary

reference architectures and represent an emerging standard

for application domain-independent asset and condition data

management. Herein, we will report on our experiences

while implementing a data management backbone based on

OSA-CBM and OSA-EAI for a simulation environment

supporting PHM systems in the aerospace domain. Our

work encompasses both airborne embedded systems and

ground-based PC systems. While we can generally confirm

the feasibility of OSA-CBM and OSA-EAI, we found

several implementation recommendations unsuited to real-

time operating conditions. To address these issues, we

propose work towards standardizing non-XML-based

transportation formats for OSA-CBM data packets. Further,

we discovered issues specific to implementing the OSA-EAI

data model in the aerospace domain. These issues drove our

proposal to extend the OSA-EAI database model, where we

seek to optimize its usability for analytical tasks. To

underline the feasibility of our solutions, we provide

empirical evidence drawn from our work. The conclusion is

a summary of our experience and the direction of future

work in the area of PHM system design for aircraft

maintenance. In total, our contribution to the community is

best seen from a practitioner’s perspective. We aim to

establish best practices for and contribute to the evolution of

OSA-CBM and OSA-EAI.

1. SIMULATION ENVIRONMENT

The aerospace industry is a core application domain and

development driver for PHM systems. The paradigm shift

towards predictive maintenance which PHM systems

impose to maintenance and overhaul processes promises

higher aircraft availability coupled with lower overall

maintenance costs. As in any other domain, challenges in

introducing PHM systems to the aerospace domain are

twofold. On the one hand, there are individual challenges in

developing sensor technology, state detection, and health

assessment methodologies/models for determining the

future life span of a (possibly deteriorated) component. On

the other hand, there are distinct challenges when

integrating heterogeneous data from disparate and

distributed sources into consolidated information and

dependable decision support. This applies at both the

aircraft and fleet level. It has therefore been recognized in

the community that standardized and open data management

solutions are crucial to the success of PHM. Such a standard

should introduce a commonly accepted framework for data

representation, data communication, and data storage.

_____________________

Andreas Loehr et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License,




59


2

EADS Deutschland GmbH, Cassidian, is developing a

comprehensive simulation framework for research in the

areas of condition monitoring and prognostic health

management. The framework includes airborne functions

hosted on embedded systems, as well as ground-based

functions hosted on PC-based systems. The primary

objective is to interconnect both airborne and ground-based

systems using a uniform data management philosophy and,

as far as possible, uniform communication protocols. In this

paper, we report on experience from our task to define and

implement the data management backbone for such a

simulation framework. The backbone is based on the Open

System Architecture for Condition-based Maintenance

(OSA-CBM) and the Open System Architecture for

Enterprise Application Integration (OSA-EAI).

1.1. OSA-CBM

The OSA-CBM reference architecture has become the de

facto standard for exchanging data in a condition monitoring

system. Being an implementation of the ISO-13374

functional specification, the architecture defines six

functional layers. Each layer is allocated different and

unique functions of the data processing chain in a condition

monitoring system.(see Figure 1).

Figure 1.OSA-CBM Reference Architecture

This architecture focuses on the definition and

communication of data. Specifically, on the question as to

which data entities and events can be exchanged between

the layers during operation and the communication

interfaces used for this purpose. The format by which the

data is exchanged between the layers remains unspecified;

however, the usage of XML messages, which are

transported over HTTP, is recommended. For this purpose,

the standard provides a thorough collection of specifications

for XML messages.

1.2. OSA-EAI

The reference architecture OSA-EAI is complementary to

OSA-CBM. It specifies a comprehensive data storage

architecture for asset management systems. This

architecture consists of: a physical relational data model

(Common Relational Information Schema, CRIS), a

corresponding logical object model (Common Conceptual

Object Model), and CRUD interfaces (Create, Retrieve,

Update, Delete) for all defined entities in the data model, as

depicted in Figure 2. In the course of harmonizing OSA-

EAI with OSA-CBM, the data model defines entities that

are capable of storing data originating from all six OSA-

CBM layers. Analogously to OSA-CBM, it is recommended

that clients interact with an OSA-EAI database via XML

messages transported via HTTP. For this purpose, the

authors of the OSA-EAI standard provide a multitude of

CRUD XML message specifications. These specifications

define how to manage data contained in the database and

how to make the data available to any other stakeholder or

application within a PHM system.

Figure 2. OSA-EAI Reference Architecture

A link to the MIMOSA organization, which maintains the

reference architectures, can be found in the references

section.

2. SIMULATION ENVIRONMENT

The simulation environment consists of an air segment and a

ground segment, (inter-)connected by a data management

backbone that relies on OSA-CBM and OSA-EAI. In the

following section, we introduce the high level architecture

of our simulation framework.

2.1. Air Segment

The air segment of the simulation framework models those

systems and associated sensors for which we intend to

develop IVHM capabilities. At the core of the framework is

a central IVHM data processor. Sensors push their data to

this IVHM data processor via an OSA-CBM compliant

implementation. As a reflection of the working

environment, the underlying message protocol is optimized

for embedded systems (detailed in section 3). The IVHM

data processor calculates IVHM information according to

the OSA-CBM layer specifications, up to the health

assessment layer (refer to Figure 3).

Figure 3. Air Segment of Simulation Framework


60


3

2.2. Ground Segment

The central data processor supports the downloading of

data, which has been collected and calculated on board the

aircraft, to the ground-based environment for further

processing (e.g. during the aircraft’s turnaround). Once

downloaded, the data is stored in a central data management

component, which we call the CBM data warehouse (refer

to Figure 4).

Figure 4. CBM Data Warehouse

The CBM data warehouse is based on the OSA-CBM/OSA-

EAI reference architectures and it serves two major

purposes: first, it hosts all current (i.e. short timeframe) and

historical (i.e. long timeframe) condition data. Second, it

provides services to distributed client applications that are

involved in the PHM process. Such services include the

CRUD interfaces as defined by OSA-EAI (e.g. for asset

configuration management), high layer functions as defined

by OSA-CBM (prognostic assessment and advisory

generation), and other services relevant for a PHM system.

In our context, data management includes the entire data set

life cycle: from initial instantiation of a sensor value,

transportation to the IVHM data processor, downloading to

the ground-based environment, on through to storage and

further processing. In section 3 we discuss aspects of OSA-

CBM-based data management in an embedded system.

Section 4 derives from experience gained while realizing the

CBM data warehouse.

3. OSA-CBM IN AN EMBEDDED SYSTEM

Following an initial implementation of OSA-CBM using

XML messages transported via HTTP/TCP, we decided to

use binary messages transported via a UDP/IP stack. This

significant departure from the MIMOSA recommendations

was driven by requirements that arose from our intended use

of OSA-CBM in the context of embedded systems certified

for in-flight usage. Our focus of interest for on-board

implementation ranges from data acquisition layer up to

health assessment and the following sections report about

our experience in implementing these classes using the C

programming language.

3.1. Environment

When fielding OSA-CBM compliant applications on

embedded systems certified for in-flight usage, several

issues are brought to the fore. Ultimately, two aspects

defined the unique structure of our solution: resource

limitation and non-dynamism. Computing hardware for

avionics, due to qualification requirements, are generations

behind present off the shelf computing hardware.

Implementation rules for applications hosted on real-time

operating systems (such as VxWorks) typically forbid

dynamically allocating memory resources, as these

operations are potentially non-deterministic and lead to

memory leaks if not used carefully. This environment

imposes further constraints on the solution space: due to

qualification or certification requirements (depending on the

risk class of the final system) all embedded code must be

written in the C programming language. Furthermore, UDP

must be used as the sole protocol for network

communication.

3.2. Use Case and Design Considerations

We want to transmit a heavy load data event set which

contains four heterogeneous OSA-CBM DMDataSeq

events at individual sample rates of 160Hz, 360Hz and 1

kHz. Additionally, we want to transmit a light load data

event set, containing a single DMDataSeq event recorded

at 20Hz; both data event sets will be transmitted with a

frequency of 1Hz.

Generating OSA-CBM compliant XML representing our

two event sets and packaging the XML into UDP packages

as ASCII code was a straight-forward implementation

approach as it has been performed by others (Swearingen,

Kajkowski, Bruggeman, Gilbertson &Dunsdon, 2007).

Generally, it involves the following three steps:

1. Sender: assemble the XML from an internal data

representation in memory

2. Sender: marshal the XML into a UDP package and

send

3. Receiver: Unmarshal and parse the received XML

and populate an internal data representation in

memory

As we will show later on, in Table 1, using XML generates

a structure in which 75% of the transmitted data is

apportioned to meta-data defining the XML structure.

Additionally, due to its absolute size, the heavy load data

event set exceeds the maximum size of a UDP packet.

While it would have been possible to split up its data into

several UDP packages, we consider the ratio between meta-

data and payload to be unsuited to the constrained allocation

of computing resources. We acknowledge that if we assume

our heavy and light load data event sets would be the only

loads on the communication channel (e.g., ethernet), there is


61


4

no risk that it will exceed transmission capacity; but this

assumption may not hold in a real aircraft design where

communication is channeled and, due to the availability of

qualified or certified hardware, the transmission capacity

might be drastically limited. We also researched XML

parsers that are written in C, and therefore compile for

embedded environments, (e.g., Mini-XML, Expat, RXP) but

we found them incompatible with internal programming

policies (static memory allocation). Additionally, the high

risk involved in the certification or qualification of an XML

parser for an embedded system finally drove our decision

towards a non-XML-based binary solution for marshalling

and unmarshalling OSA-CBM data.

3.3. Design and Implementation

OSA-CBM is an object-oriented specification and therefore

makes use of polymorphism, which is the ability to create

object attributes, object functions or even an entire object

that has more than one form. Our implementation of OSA-

CBM is based upon the representation of OSA-CBM classes

by a set of C structures. The C programming language is

procedural and does not offer native polymorphism. After

analyzing data manipulation through health assessment

layer communication classes of the OSA-CBM object

model, we concluded that a mapping of OSA-CBM classes

to C structures is possible. We will next explain our

rationale in supporting this approach.

The C programming language decouples data from

functionality, therefore we did not have to map

polymorphism of functions (OSA-CBM does not define

behavior of the classes, anyway). We also could not identify

polymorphism of attributes for the classes of our interest.

However, there is polymorphism of objects, i.e., specific

derived classes inherit part of their structure from one or

more base classes. We mapped this kind of polymorphism

by initially modeling C structures for each root class (i.e.,

classes that do not have a base class in the OSA-CBM

model). For all non-root classes we modeled a member in

the derived class which is of the type of the respective super

class. As an example, the structure for the data sequence

event of the DM layer (DMDataSeq) is shown in Figure

5(c). The corresponding base class structures are shown in

parts (b) and (a), respectively.

Within specific limits our approach is also able to emulate

multiple inheritance by including more than one base class

member; however, the part of the OSA-CBM data model

that we focused on does not involve multiple inheritance.

For transmission, multiple data event instances are bound

together into a data event set. Regarding a single instance of

an OSA-CBM base class, its actual subtype at runtime can

be anything. This is critical to the C implementation as the

DataEventSet class acts as a transportation container for

any DataEvent instances. We solved this problem by

introducing a constraint: an OSA-CBM data event set may

only include data events of the same type. This allowed us

to introduce a non-standard member on the

DataEventSet class which is of enumerated type

OsacbmDataType and which indicates the type of

included events.

Figure 5. Exemplary Payload OSA-CBM Structures

The received byte stream can therefore be interpreted

correctly on the receiver side. For the transmission itself, we

copy a structure’s memory image into a temporary buffer.

Additionally, as required by the event type, the buffer

memory is appended with a data block for each reference

from a structure’s pointer members (here: values and

xAxisDeltas). Finally, the buffer is sent as a UDP

packet to the receiver, where is reconstructed into a set of

OSA-CBM compliant data. Consequently, we support both

static data types (such as DMReal) and dynamic types (such

as DMDataSeq). Though, as a necessary overhead,

complex data sequences require recipient side remapping of

pointers at run time and a maximum payload size must be

defined for real time operation.

3.4. Evaluation

Quantitative evaluation will be accomplished here with a

comparison between the data required for an ASCII XML

data transmission versus that of our custom binary

transmission protocol. We used Ubuntu 10.0.4 (32bit) as

sender and VxWorks on Power PC (32bit) bit as recipient.

Table 1 outlines the data characteristics of two

representative communication samples.

Figure 6. Data Event Set as C Structure


62


5

The first sample is a heavy load data event set. It contains

four heterogeneous OSA-CBM DMDataSeq events at

individual sample rates of 160Hz, 360Hz and 1 kHz. The

overall data event set has a frequency of 1Hz. The resulting

data push represents 2,520 individual measurements being

sent across the system every second. The second sample is a

light load data event set, containing a single DMDataSeq

event recorded at 20Hz; the corresponding overall data

event set has a frequency of 1Hz.

XML Binary Ratio

Heavy Load 165 345 bytes 40 792 bytes 4.1

Light Load 1 827 bytes 576 bytes 3.2

Table 1. Data Transmission Size Comparison

As seen in Table 1, there is a significant reduction in the

volume of data transmissions achieved by our approach,

ranging up to a factor of four. An additional effect of our

approach, as compared to sending XML messages via UDP

instead of via HTPP/TCP (Swearingen, Kajkowski,

Bruggeman, Gilbertson & Dunsdon, 2007) is a significant

reduction in the processing overhead required by XML

structural parsing; this reduction is beyond the scope of our

present analysis.

However, there are drawbacks of our approach. As UDP is a

stateless protocol, there is a cap on the amount of data that

can be transmitted per event set. It is limited to the

maximum allowed size of a UDP Data package (UDP

specifies a maximum allowed size). Depending on platform

specific settings the maximum available size can be

significantly less.

We believe that this size limitation is best addressed by

splitting the data set into a series of discrete packets, as

opposed to introducing additional limitations and overheads

on the binary transmission format. Data management within

a closed on-board real-time environment a priori requires

that the overall data communication is well designed

regarding timing and loads. In such a closed and well

controlled environment the likelihood of UDP packet loss is

minimized, however, it may happen. Therefore, we propose

the usage of UDP-based transmission only for functions

which can cope with temporary gaps in their data input,

such as our diagnostics algorithm. For functions which are

not robust to data losses, a confirmation and resend protocol

could be invented, but that would negate the usage of UDP

and TCP would be the transmission protocol of choice.

Our current implementation is highly platform dependent as

it is patched to meet the characteristics of our environment

(sender 32bit Ubuntu, recipient 32bit VxWorks). To

overcome platform differences we introduced artificial

padding bytes (see C structure members in Figure 5) so that

the internal in-memory arrangement is equal on both

platforms and performed byte-swapping on the receiving

platform. This allowed us to easily case the UDP package

payload into the required structures (including pointer

remapping).

Finally, XML messages can be read by humans more easily

than binary messages. This may impose complications to

the debugging cycles during software development;

however, from our experience, software developers tend to

develop the ability to “read” binary content over time, in

particular if sophisticated Hex editor tools are being used. A

steeper learning curve certainly is worth the performance

gains. As for the generation of test data for certification or

qualification, binary protocols do not impose significant

overhead, as also with XML a generative approach will

have to be used to deal with the large amount of test cases.

3.5. Outlook

Our initial implementation, transmitting the memory image

of structures, is not optimal when communication must take

place between heterogeneous platforms and only allows for

a homogenous data event set payload. Yet, it yields

significant performance gains, reduces the consumption of

memory, and simplifies certification or qualification. As

shown above, issues related to padding and regarding the

arrangement of data in RAM may arise. While these issues

can be mitigated if the characteristics of the platforms are

known, the scalability in general remains limited. To

address these issues, we started the development of a

custom binary OSA-CBM protocol. The vision was to

evolve this protocol as a generic and platform-independent

means for transporting OSA-CBM events over the network

in a binary fashion. In Figure 7 we provide an excerpt from

our initial work to illustrate the proposed design approach.

Based on preliminary low level definitions (such as big or

little endian, widths of primitive data types) all OSA-CBM

classes are modeled as a sequence of 16 Bit words. In our

example, an ID consists of two words, i.e. it represents a

32bit integer value.

Analogously, the OsacbmTime class is represented as a

sequence of five words (our customized implementation

only required the time_type and time_binary

attribute). With every class having such a specific

representation, data events and entire heterogeneous data

event sets can be assembled. For dynamic structures, upper

bounds for the allowed amount of dynamic data must be

defined (possibly implementation specific) in order to meet

the requirements of real-time operating systems. To avoid

sending spare data, the binary representation of such

dynamic portions requires that one includes a member that

defines the actually allocated amount of data (up to a

maximum dictated by the data size allowed in a UDP

packet). An example is the member

DMDataSeq.dataSize, which is not part of the OSA-

CBM specification but which is required for correctly

interpreting the words. Checksums to detect transmission

failures were foreseen as well. By standardizing the binary


63


6

representation for the network format, senders as well as

recipients have to translate between their platform specific

representation and the network format. Although there is

marshalling and un-marshalling to be done, we hypothesize

that the CPU load for this process can be neglected

compared to XML parsing.

Figure 7. Exemplary binary representation of DataEventSet

Based upon results shown in the previous section, the size

of data structures in this new network format will be in the

area of 25% of a corresponding XML representation.

3.6. Binary Message Format in OSA-CBM 3.3.1

The most recent version of OSA-CBM, Version 3.3.1,

includes a specification for a binary transmission format for

OSA-CBM messages. We see our work confirmed by this

addition to the OSA-CBM standard. Following an initial

design and trade study, we decided to adopt MIMOSA’s

specification as the network layer format amongst our

subsystems. Though this choice rendered our custom

protocol design work moot, it is implementation that has

been and remains the focus of our work. Furthermore, the

compatibility of our systems with the rest of the community

will be ensured by following a standard which is now part

of that community. That is to say, our optimizations in the

marshalling/un-marshalling of data within and amongst real

time embedded systems and in the creation of an API/library

for OSA-CBM transmission is just as critical while using

the MIMOSA standard as with our custom message format.

Our aim is to create a fully C coded, statically allocated

implementation of the OSA-CBM Binary message

specification for embedded systems.

4. CBM DATA WAREHOUSE

The ground segment of our simulation framework includes a

central repository for data and information, called the CBM

data warehouse.

4.1. High Level Requirements

Design of the CBM data warehouse was driven by the

following high-level requirements.

1. The CBM data warehouse shall act as a central

information system for all applications involved in

the PHM process.

2. The CBM data warehouse shall provide a uniform

and standardized interface for managing and

querying its data.

3. The CBM data warehouse shall maintain full

traceability for any in-service data item regarding

origin, allocation (to assets, aircraft and flights) and

changes.

Given the need to meet these requirements across a large

fleet of aircraft, the design of the CBM data warehouse

faces two core challenges. First, it must process a large

number of transactions originating from daily maintenance

tasks, such as asset installation/removal and storing newly

available IVHM-data from performed flights. Second, it

must process and store a large amount of historical data for

performing diagnostics and prognostics, as well as their

continual improvement as more in-service data becomes

available.

4.2. Realization

The OSA-EAI and OSA-CBM reference architectures

define a uniform data management philosophy that allows

for full traceability of virtually any sensor value and its

derived information. Earlier work (Gorinevsky, Smotrich,

Mah, Srivastava, Keller & Felke, 2010, and others)

demonstrated the feasibility of using these architectures as a

reference to build a comprehensive information system and

associated service interface across multiple domains,

including aerospace. We consequently considered the

selection of OSA-EAI and OSA-CBM as guidelines for the

design of our CBM data warehouse as a promising approach

to satisfy our high level requirements.

4.2.1. Scope

We have implemented a subset of the OSA-EAI standard for

our initial version of the CBM data warehouse. The subset

was derived with the aim of providing data management for

diagnostics and prognostics on our candidate systems.

Confirming reports from other researchers, we found the

documentation of OSA-EAI to be rather sparse, especially

when mapping its generic universe of entities to a specific

application domain. We concentrated on the ability to


64


7

express system breakdowns (Assets, Segments, and

Parent/Child relations) and the ability to associate data from

the data acquisition, data manipulation, and state detection

layers. Additionally, each asset was to have an active history

of health assessments and remaining useful life estimates.

We expected that this would lead to an implementation of

tables exclusively from the REG, DIAG, DYN and TREND

groups of entities; however, with the exception of the

TRACK group, we had to implement at least one table from

all other entity groups in order to satisfy mandatory

connections between tables. We consider this a symptom of

the complexity of the OSA-EAI standard, and strongly

encourage the maintainers of the standard to establish a

sample or reference application for OSA-EAI (and OSA-

CBM), similar to the SCOTT database example of Oracle.

4.2.2. Customization

We customized the remaining OSA-EAI tables in a way that

would simplify the generation of test and reference data, but

still allow for the drawing of general conclusions (congruent

customization) from our experience. We made further

customizations to map specific features of the aerospace

domain (domain customizations). Many tables of OSA-EAI

have a composite primary key (i.e. 2 or more columns) due

to the fact that the database model is designed for data

exchange or integration amongst different database

instances. For this purpose OSA-EAI introduces the Site

concept, which uniquely identifies the stakeholder of a

specific dataset. In combination with the dataset ID, any

dataset can thus be uniquely identified. Since our simulation

framework is currently a closed system, the maintainer

remains constant. Therefore, we stripped the composite

primary keys of each entity down to a single dataset id,

allowing us to strip down foreign keys as well. This

approach was shown to be feasible by Mathew, Zhang,

Zhang and Ma Lin (2006).

We further recognized that OSA-EAI does not have the

specific notion of a flight, or a mission. This was not

unexpected, as OSA-EAI is generic; however, analyses in

the aerospace domain are often flight/mission centered. Per

definition, OSA-EAI measurements can only be related to

assets/agents and time. Additions were necessary to relate

measurements with a specific flight/mission entity under

which they occurred. These updates allow the system to

couple flight/mission characteristics and degradation. While

OSA-EAI foresees enough meta-data to perform a

chronological mapping to an external flight/mission

database, our experience from other projects shows that a

direct mapping of information to a flight (or at least a power

cycle) is inevitable.

In the aerospace domain, segments represent virtual

“placeholders” for assets and these placeholders have

unique logistic control numbers. Such features can be

represented by OSA-EAI using the attributive tables for

each segment (Segment Numeric Data or Segment

Character Data). However, being modeled as an explicit

attribute of a segment, the evaluation of logistic control

numbers is more efficient. We recognize that one could

come up with many such contra arguments, as OSA-EAI is

a domain independent and generic standard.

4.2.3. Performance Considerations

Coping with a large number of transactions and handling

large volumes of data at the same time, the CBM data

warehouse has both the role of an Online Transaction

Processing (OLTP) system and that of an Online Analytical

Processing (OLAP) system. These two requirements seem

to contradict each other at first glance.

The database model of an OLTP system is normalized, that

is, it consists of many interconnected tables and each table

describes a fine granular bit of the application domain. The

number of tables that contain redundant information

(possibly in different representations) is minimized so that

the risk of a transaction leaving the database in an

inconsistent state is low. Due to its appearance from a bird’s

eye view, a normalized schema is referred to as a snowflake

schema. For an OLTP system, normalization is a

prerequisite, as it supports CRUD operations with optimal

performance and data integrity. The downside of a

snowflake schema is that information retrieval and analysis

result in complex queries involving many tables, which

results in bad performance.

The database model of an OLAP system is de-normalized,

which means that it consists of few tables, which contain

redundant information for the sake of reduced query

complexity and minimal join operations. Due to its

appearance from a bird’s eye view, a de-normalized OLAP

schema is referred to as a star schema. Snowflake and star

schema are depicted in Figure 8. The information of interest

is marked as grey boxes. The OSA-EAI database model in

its current state is heavily normalized and therefore clearly

OLTP-centered. Others have confirmed this statement using

formal methods (Mathew and Ma, 2007). Although we

could confirm specific issues regarding modeling and

documentation (Mathew et al., 2006), we still consider

OSA-EAI as well defined for transactional tasks. In contrast

to criticism that has been raised by industry, we consider the

normalization of OSA-EAI as essential, whereas Mathew

and Ma (2007) argue that the normalized character of OSA-

EAI is one of its weaknesses.

Applying standard modeling techniques to selected subsets

of interconnected OSA-EAI tables, they propose OLAP-

centered alterations for OSA-EAI according to star schema

design. These show that, at least for selected subsets of

coherent CRIS tables (so called data marts), the OLAP-

centered model holds equivalent information. Not

surprisingly, Mathew and Ma (2007) acknowledged that

their redesign optimizes analytics, but has significant


65


8

drawbacks for transactional use. They conclude with a

discussion of their motivation for further work towards a

compromise.

Figure 8. Snowflake (OLTP) vs. Star Schema (OLAP)

We argue that such a compromise cannot manifest as a

single data model that features characteristics from both

OLTP and OLAP-centered models. Such an approach would

fit neither side. Instead, motivated from our findings during

the realization of the CBM data warehouse and the

experience from our other projects that deal with large data

volumes (which go beyond the scope of this document), we

propose an extension to OSA-EAI to specifically support

analytical tasks on large volumes of historical data.

4.3. “Common Relational Analytics Schema”

The characteristics of OLTP and OLAP are too distinct to

be merged into a single database model. The database model

that is defined by OSA-EAI is called Common Relational

Information Schema (CRIS). Instead of redesigning CRIS to

include OLAP-specific features, we propose a new

standardized database model named Common Relational

Analytics Schema (CRAS). Our proposed database model

lives under the umbrella of OSA-EAI and coexists with

CRIS. Since an OLAP-centered database is primarily

designed for reading (not writing), the CRAS portion of

OSA-EAI will be populated on a regular basis from the

content stored in the CRIS portion. Both portions hold an

equivalent informational content – however, CRIS is

optimized for transactional purposes while CRAS is

optimized for analytical purposes.

4.3.1. Motivation

For a PHM system, it is necessary that prognosis be

performed in a short timeframe, e.g. during the turnaround

phase of an aircraft. However, this is different from actually

performing analytics. At least the prognostics algorithms

that we were utilizing require neither the entirety of all

recorded historical data, nor any preprocessed results

requiring filtering or aggregation (which are typical tasks of

OLAP systems). A limited set of data, say from the last N

flights, was sufficient. We found that with the standard

CRIS queries these limited historical datasets could be

retrieved reasonably fast. We draw this conclusion from our

direct experience with the tools we created. Our sample

database did not contain fleet condition data from several

aircraft over several years. And with such huge amounts of

data the performance will degrade. We hypothesize,

however, that using table partitioning techniques, which

have become available with today’s relational database

management system (such as Oracle’s Enterprise Edition), it

is possible to set an upper limit for the amount of data that

has to be searched by a query to identify the prognostics raw

data from the last N flights. An apparent partition key is

time, but Site is also a promising candidate.

We further suggest that analysis tasks that would require an

OLAP-centered database model be conducted on a regular

basis, but decoupled from the daily operational (i.e.

transactional) business. We claim that it is therefore suitable

to populate the CRAS on demand (e.g. once a month) in

order to perform retrospective analyses (e.g. for the

continuous improvement of diagnosis and prognosis).

4.3.2. Architecture

A high level overview of our proposed architectural

extensions of OSA-EAI is given in Figure 9. The elements

drawn in grey represent the current state of the art of OSA-

EAI. The OLTP-centered database model, CRIS, stores the

operational data in a relational database (the corresponding

object model has been omitted). Furthermore, the OSA-EAI

standard defines a comprehensive service interface for

accessing and modifying the operational data. We propose

to extend OSA-EAI according to the following three aspects

(corresponding to the black-marked items in Figure 9):

1. Database model that is optimized for analytical

purposes (OLAP), which is able to store a

congruent informational content as CRIS. We call

this database model the Common Relational

Analytics Schema (CRAS). It is organized

according to the star schema approach.

2. A standardized interface for issuing

multidimensional queries against CRAS.

3. Standardized Extraction, Transformation and

Loading (ETL) process populating tables in the

CRAS schema with operational data from CRIS.

4.3.3. Performance and Operational Considerations

Our work regarding CRAS suggests an a priori hybrid

approach for database modeling. We are currently refining

the concept and have just begun prototype implementations.

Therefore, we cannot yet provide empirical results; in

particular, when it comes to handling data volumes in the

magnitude of terrabytes. For these volumes, the concept has

yet to be proven. While the idea of CRAS as a complement

to CRIS is clearly new, the methodology that it is based on,

i.e., the star schema, has been available for years and is well

understood. The star schema yields excellent performance

results even with large data volumes. We have gained


66


9

empirical knowledge from another work area which requires

queries that involve both filters and aggregation. Results

indicate a boost, due to the star schema approach, in the

magnitude of 10 to 100 with respect to response time when

handling millions of data sets.

Figure 9. CRAS Extension of OSA-EAI (shown in black)

with an optional data model which is optimized for analytics

To ensure scalability for the joint operation of CRIS and

CRAS, we propose the following methodology. It is known

that the performance of both the CRIS and CRAS schemas

degrade with a growing amount of data. However, we

believe the CRIS schema will degrade faster than the CRAS

schema. Once a fresh system has been set up, the CRIS

portion will be constantly populated with new data, and, in

reasonably short intervals, the CRAS schema will be

constantly recreated from the current data in CRIS by the

ETL process. The CRAS schema is stateless at this phase, as

it can always be recreated from CRIS. Operational tasks will

be carried out in the CRIS, while analytical tasks run on the

CRAS. Provided that suitable hardware segmentation is

available (e.g., dedicated CPUs, dedicated RAID volumes)

operations on both schemas should not influence each other.

Once specific hot spots of the CRIS schema have degraded

to a stage where performance is no longer acceptable, old

data must be archived in the CRAS schema. We assume that

one can define data as being old simply by its date of

creation or other criteria. We further assume that such old

data will not be altered due to operational processes; which

certainly applies to sensor data. Therefore the ETL can

move (instead of just transform) old data to CRAS where it

will then permanently reside – just not in the CRIS form.

Since there is no need to alter the old data, it can be

removed from CRIS completely, mitigating the performance

degradation. However, the old data is still available for

analysis in CRAS. From this point on, the CRAS schema

becomes stateful, as it cannot be entirely recreated from

CRIS.

From a high level point of view, the CRIS schema’s data

volume will grow up to a specific limit and then shrink

again, so there is a worst case performance for operational

tasks. In contrast, the CRAS schema will constantly grow

with each new archival process. However, the growth will

take place in a database schema that is designed for

performance and large volumes; nevertheless, without

suitable measures the CRAS cannot grow indefinitely.

There are scaling measures to ensure performance of

database schemas in general that can be applied to our

situation. For data archived in CRAS which still needs to be

considered during online analyses, so called partitions

should be maintained. A partition influences the way a

database physically stores a database table on the storage

device but keeps this storage strategy transparent to the

application (programmer). Partitions can be created during

maintenance phases of the PHM system. Depending on

specific criteria of the data set, such as the date of creation

(the so called partition key) it will be assigned to one

partition or the other. Partitions can be assigned a separate

storage device, i.e., one disk for each partition. Therefore,

even specific tables can be scaled independently from

others. While the further discussion goes beyond the scope

of this writing, the effect is that the search space for queries

can be significantly reduced. Operational data that the ETL

transforms from CRIS will have its own partition(s),

whereas all archived data will have separate partitions. We

believe therefore that the effects of a growing CRAS on the

continuous ETL transformation of operational data can be

mitigated. However, if the amount of data in CRAS

significantly degrades the online analysis performance, one

has to consider moving the oldest data from CRAS into

offline storage. Here, we assume that this data no longer

contributes to an operational PHM (e.g., data from assets

that have been moved out of service) and can be analyzed

offline (or e.g., in a separate database).

4.3.4. Challenges and Future Work

There are two core challenges involved in our work. First,

the concept of joint operations between CRIS and CRAS

needs to be proven. We have to derive enough sample data

and set a representative database configuration and

environment to prove our claim. In its current stage, this

approach is merely a concept. While the methodologies and

technology it is built upon have proven to be feasible in

other domains, the risk of not being able to implement it as

proposed is non-negligible. In the previous section we

mention the introduction of offline storage for the oldest

data in the system. We want to point out here a new aspect

of performance research for OSA-EAI by combining it with

Hadoop, an emerging technology for distributed storage and

query of huge volumes of data. Second will be the

derivation of a generic CRAS schema that fits the needs of

analytical tasks for PHM in a domain-independent manner.

This must be accomplished while maintaining the same


67


10

level of quality as CRIS does in fitting the needs of

transactional usage in a generic way. Mathew et al. (2007)

have applied a formal process for attempting to derive an

initial OLAP-centered database model from CRIS. They

identified so called data marts (fact tables and

corresponding dimensional tables) for the areas of

configuration data, measurements, health and alarms, events

and work management. However, they give no reason as to

why no data mart for remaining useful life was identified.

As such, the actual details of the generic ETL process are

left open for future work.

5. CONCLUSION

We presented our experience from the realization of a data

management backbone for a simulation framework for PHM

systems in the aerospace domain. For the airborne segment

OSA-CBM-based communication was chosen. We

encountered issues relating to the recommended

transportation protocol for OSA-CBM when implementing

the standard under the conditions of a real-time operating

system. From our findings, we are motivated to use a binary

transportation format for OSA-CBM data events that

address embedded systems. This standard is to be both

binary and lean. In the process, we hope to avoid the

inherent overhead in processing power and memory

consumption of an XML-based transportation over HTTP.

Our preliminary results are promising. They amount of raw

data to represent specific OSA-CBM messages could be

reduced to 25% of the XML-based size (overhead for HTTP

and TCP not included). As our approach lacks platform

independence we outline a path for future work towards a

platform-independent binary representation for OSA-CBM

messages. The ground-based part of our data management

backbone is centered on an information system, which we

call the CBM data warehouse. It is designed according to

the OSA-EAI reference architecture. Confirming the

feasibility of OSA-EAI in conjunction with OSA-CBM, we

encountered minor issues in mapping aerospace domain

concepts to the generic entities and could confirm issues

reported by others. To answer the necessity of a PHM

system to perform both transactional and analytical

interaction with the CBM data warehouse, we recommend

extensions to OSA-EAI. We propose an optional and

complementary database model called CRAS (in analogy to

CRIS) that is optimized for analytical queries and follows

OLAP principles. It coexists with CRIS and is populated, on

demand, by CRIS transactional data. We close by pressing

for future work in this area in the form of field studies.

REFERENCES

Gorinevsky, D., Smotrich, A., Mah, R., Srivastava A.,

Keller, K., &Felke, T. (2010). Open Architecture for

Integrated Vehicle Health Management. AAIA

Infotech@Aerospace Conference, April20-22

Mathew, A. D., &Ma, L. (2007). Multidimensional schemas

for engineering asset management. Proceedings World

Congress on Engineering Asset Management,

Harrogate, England

Mathew, A. D., Zhang, L., Zhang, S., & Ma Lin (2006).A

review of the MIMOSA OSA-EAI database for

condition monitoring systems. Proceedings World

Congress on Engineering Asset Management, Gold

Coast, Australia

MIMOSA. Mimosa Organization Website.

http://www.mimosa.org

Swearingen, K., Kajkowski, W., Bruggeman, B., Gilbertson,

D., &Dunsdon, J. (2007). Multidimensional schemas

for engineering asset management. Proceedings IEEE

Aerospace Conference

BIOGRAPHIES

Matthias Buderath Aeronautical Engineer with more than

25 years of experience in structural design, system

engineering and product- and service support. Main

expertise and competence is related to system integrity

management, service solution architecture and integrated

system health monitoring and management. Today he is

head of technology development at Cassidian. He is member

of international Working Groups covering Through Life

Cycle Management, Integrated System Health Management

and Structural Health Management. He has published more

than 50 papers in the field of Structural Health

Management, Integrated Health Monitoring and

Management, Structural Integrity Programme Management

and Maintenance- and Fleet Information Management

Systems.

Conor Haines received his B.Sc. degree in Aerospace

Engineering from Virginia Polytechnic Institute and State

University in 2003 and his M.Sc. degree in Computational

Science from the Technical University of Munich in 2011.

For 3 years Conor was a test engineer supporting the NASA

Near Earth Network, providing simulation support used to

guide system development. At his current post, he is

focused on developing IVHM and Computer Vision

technologies as a Software Engineer for Linova Software

GmbH.

Andreas Löhr received his M.Sc. degree in Computer

Science from the Technical University of Munich in 2001

(Informatics, Diplom) and earned his PhD degree in

Computer Science from Technical University of Munich in

2006. For 6 years he worked as a software engineer at

Inmedius Europa GmbH in the area of interactive technical

publications and researched in the field of wearable

computing. He founded Linova Software GmbH in 2008

and at his current post as managing director he focuses on

development of maintenance information systems and data

management architectures.


68

Designing Data-Driven Battery Prognostic Approaches for

Variable Loading Profiles: Some Lessons Learned

Abhinav Saxena1, José R. Celaya

2, Indranil Roychoudhury

3, Sankalita Saha

4, Bhaskar Saha

5, and Kai Goebel

6

1, 2, 3 Stinger Ghaffarian Technologies Inc., NASA Ames Research Center, CA, 94035, USA

[email protected]

[email protected]

[email protected]

4,5Mission Critical Technologies Inc., NASA Ames Research Center, CA, 94035, USA

[email protected]

[email protected]

6NASA Ames Research Center, CA, 94035, USA

[email protected]

ABSTRACT

Among various approaches for implementing prognostic

algorithms data-driven algorithms are popular in the

industry due to their intuitive nature and relatively fast

developmental cycle. However, no matter how easy it may

seem, there are several pitfalls that one must watch out for

while developing a data-driven prognostic algorithm. One

such pitfall is the uncertainty inherent in the system. At each

processing step uncertainties get compounded and can grow

beyond control in predictions if not carefully managed

during the various steps of the algorithms. This paper

presents analysis from our preliminary development of data-

driven algorithm for predicting end of discharge of Li-ion

batteries using constant load experiment data and challenges

faced when applying these algorithms to randomized

variable loading profile as is the case in realistic

applications. Lessons learned during the development phase

are presented.

1. INTRODUCTION

The field of prognostics is steadily maturing as an important

field under health management as newer algorithms are

constantly being developed. Among the two main categories

are data-driven and model-based algorithms with competing

advantages and limitations (Schwabacher, 2005). This paper

summarizes our experience from implementing a data-

driven approach for a variable load discharge scenario for

Lithium-ion (Li-ion) batteries using experimental data

collected in controlled lab environment.

An intuitive observation-based approach was initially

implemented, which required considerable improvements as

we learned about various shortcomings during the

development process. In this paper we present our lessons

learned from the exercise, as well as an analysis of various

pitfalls that may be encountered in developing data-driven

methods that may seem intuitive and relatively

straightforward in the beginning but may not match up on

expectations when actually implemented. The paper also

presents a detailed description of our data-driven algorithm.

Corresponding results are also compared with a model

based algorithm using an empirical degradation model.

1.1. Motivation

The motivation for this works stems primarily from two

sources. First, it is of growing interest to develop

prognostic health management solutions for Li-ion batteries

as the use of power storage technologies is gaining

momentum in energy intensive industries. While several

efforts have focused on relevant topics, an accurate way of

estimating battery capacity during realistic load profiles

with variable and/or random operational loading still

deserves attention. This paper describes the results of our

efforts towards developing a generic data-driven approach

for developing prognostic algorithms for randomized

variable loading scenarios. It is generally assumed that data-

driven methods typically require large amounts of training

data in the initial development phase, but wherever possible,

allow a much rapid, easy to implement, and computationally

inexpensive developments compared to model-based

approaches. This however, comes at a cost of a significant

data processing effort upfront and still does not guarantee a

successful implementation. More often than not it calls for

_____________________ Abhinav Saxena et al. This is an open-access article distributed under

the terms of the Creative Commons Attribution 3.0 United States

License, which permits unrestricted use, distribution, and reproduction

in any medium, provided the original author and source are credited.


69


2

re-evaluation of the initial hypothesis and may require

significant changes adding to complexity as problems

become more realistic. In this effort we exemplify a process

where data-driven algorithms that were once perfected for

constant loading profiles do not guarantee good

performance when tried on variable loading case and

requires rethinking of the strategy, which is in contrast to an

empirical model-based approach where the original

implementation still performs well. Contrary to our initial

beliefs that for systems like Li-ion batteries, where the

characteristics of charge-discharge processes show similar

qualitative trends, data-driven methods can be adapted fairly

quickly once a data processing methodology is in place, we

found that there are significant challenges in developing a

robust data-driven method.

The second source of motivation comes from our continuing

efforts towards facilitating a standardized platform for

comparison of various prognostic approaches. Assessing

algorithmic performance and drawing comparisons against

baselines is one of the enablers towards verification and

validation. As the field of prognostics matures as a research

area, it is important to create an infrastructure that facilitates

verification and validation activities towards certification of

prognostic health management systems. This has been

somewhat difficult because until recently there were no

standard methods to evaluate different algorithms in a

comparable manner due to lack of benchmark datasets or

performance metrics useful for prognostics. An extensive

survey of health management applications and other related

domains revealed that conventional metrics, borrowed ad

hoc from diagnostic domains, had been reused, which did

not serve as well (Saxena et al., 2008). Therefore, a set of

prognostic performance metrics were developed with the

perspective of using prognostic information in health

management and decision making processes (Saxena,

Celaya, Saha, Saha, & Goebel, 2010). However, this process

could be further streamlined with the availability of

benchmark run-to-failure datasets that can be used for

prognostic algorithm development. With that intent several

accelerated aging testbeds were designed and developed at

NASA Ames Research Center and data were made available

to the PHM research community to take advantage of

through prognostics data repository (NASA, 2007). These

datasets have been downloaded more than 20,000 times

from all over the world and used for algorithm development

in the last four years. One of the popular datasets (over 6000

downloads) includes Li-Ion battery aging data that contain a

variety of operational conditions with several sensor

measurement data collected in-situ (B. Saha & K. Goebel,

2011). Despite a large number of downloads we were

unable to find more than just a few references reporting

successful prognostic implementation on battery data

(Orchard, Silva, & Tang, 2011; Orchard, Tang, &

Vachtsevanos, 2011). In this paper we report results from a

preliminary data-driven approach for a randomized variable

loading case. It is our hope that the community will take up

the problem and find other ways that can then be compared

with the ones reported here as initial baseline performance.

1.2. Paper Organization

The rest of the paper is divided into several sections. Section

2 presents a brief background of various efforts related to

prediction of battery life and battery discharge. Application

domain is described in Section 3, which explains the nature

of experiments conducted, lays out the problem of variable

loading, and presents some observations. Section 4 starts by

describing the overall approach taken and presents details of

feature extraction, learning procedure, and prediction

algorithms. Section 4 concludes with a brief discussion of

underlying learning algorithms that are used in our

prediction framework. Section 5 presents the results and

discussions, followed by conclusions in Section 6. More

details on the results are included in appendix for reader’s

reference.

2. BACKGROUND

Predicting the End-of-Discharge (EoD) times for batteries

has been investigated in the recent years to predict the time

when (a predefined) cut-off threshold voltage is reached and

the power source is no longer available to continue the task

(Bhaskar Saha & Kai Goebel, 2011). Depending on the

application type and availability of data, there are many

other approaches that focus on state-of-charge (SOC)

estimation, current/voltage estimation, capacity and state-of-

health (SOH) estimation. SOC estimation is by far the most

popular approach where charge counting or current

integration is used in different ways to estimate battery

capacity. This approach suffers from various inaccuracies

resulting under realistic usage environments (Meissner &

Richter, 2003). Use of extensive lookup tables relating

open-circuit voltage (OCV) to SOC is popular in the

electronics industry, which requires extensive testing and

data collection to build such mappings (Lee, Kim, & Lee,

2007). For safety critical applications it is important to

determine when the system will lose power, and hence use

of voltage threshold for time to end of charge prediction is

preferred. This implicitly assumes a direct relationship

between available voltage and available charge from the

battery. An example of one such application is described in

(Bhaskar Saha & Kai Goebel, 2011) where EoD time is

predicted for an e-UAV (electric unmanned air vehicle). It is

also illustrated how variable the loading can be during

extreme maneuvers and a time to EoD prediction must

account for expected future loads and environmental

conditions. An EoD time prediction application using an

empirical model based Bayesian approach is discussed in

(Saha, Goebel, Poll, & Christophersen, 2009). Among data-

driven approaches, in (Rufus, Lee, & Thakker, 2008) a

virtual sensor is described based on a data-driven approach

but primarily for SOH estimation and RUL prediction based


70


3

on usage patterns and environmental factors such as

operational temperature. A statistical approach to battery

life prediction that builds parametric models of the battery

from collected data is described in (Jaworski, 1999).

Another data-driven effort extracts and tracks changes in the

internal impedances from voltage characteristics obtained

from battery cycling data (Luo, Lv, Wang, & Liu, 2011). All

changes are attributed to battery aging only, thereby not

considering load and temperature as influencing factors.

Recent years have seen a growing interest in the use of and

machine learning techniques, e.g., Hamming network (Lee,

Kim, Lee, & Cho, 2011), and stochastic filtering techniques

e.g., unscented filter (Santhanagopalan & White, 2010) and

extended Kalman filter (Hu, Youn, & Chung, 2012) to

estimate the state of charge and/or degradation parameter

(e.g., state of capacity) of a Li-ion battery cell under a

randomly varying loading condition. Most of these data-

driven approaches are shown to work on similar data to

what they were trained on. This requires availability of

operational data from real environment, which is not always

the case. In this work we take an alternative approach by

using data-driven models that are developed from a set of

controlled experiments. We investigate whether it is

possible to extract relevant features from current and

voltage measurements collected during battery usage

(discharge cycles) under controlled experiments in various

fixed loading conditions to learn data-driven models that

would then allow us to predict EoD for a variable loading

scenario. Furthermore, estimated capacity values are not

used in making the predictions, since it is generally very

difficult and inaccurate to obtain battery capacities during

operation.

3. APPLICATION DOMAIN

The methods developed in this work are based on aging data

for 18650 Li-ion batteries available from prognostics data

repository hosted by NASA Ames (B. Saha & K. Goebel,

2011). The data used for algorithm development and testing

is generated in a battery testbed described in (Saha &

Goebel, 2009). This testbed allows charging and discharging

of batteries and collecting relevant information to estimate

the state of the battery. In-situ measurement of battery

current, voltage and temperature are available and these are

used for development (training) of data-driven algorithms.

Several charge/discharge cycles are typically applied to a set

of batteries. These batteries were charged to 4.2 volts using

an initial constant current (CC) profile of 1.5A until 4.2V is

reached, followed by a constant voltage (CV) mode until

current drops to 10mA. Since the main objective of these

algorithms is to estimate EoD, a subset of batteries is

discharged at constant current during discharge cycles; with

current levels of 1A, 2A and 4A between different batteries.

Figure 1 shows representative discharge profiles for the

training cases discharged at three different constant current

values.

Figure 1. Constant load discharge profiles at 1, 2 and 4 A currents.

Batteries are considered fully discharged (100% depth of

discharge) when they have reached 2.7V. The higher the

discharge current, the less time it takes for the battery to

discharge. The increased voltage drop off rate towards the

end of the discharge cycle is typical for this type of

batteries. This is very relevant to the algorithm development

since it presents a challenge in implementing typical

regression-based data-driven methods when dealing with the

steep non-linearity towards the end of the discharge cycle.

While discharge profiles under fixed load conditions were

used for algorithm training, variable loading cases (to

represent realistic profiles) were generated for algorithm

validation. In the variable load discharge profile, the current

is varied randomly between 1A and 4A levels. The variable

load case provides additional challenges to the EoD time

estimation algorithm. It can be observed from Figure 2 that

each time the load changes from one discrete value to

another, there is a transient in the battery voltage value. In

addition, the time of steep drop in the voltage towards the

end of discharge is uncertain as it changes every time the

load current changes and not just with the state of voltage of

the battery.

Figure 2. Variable load discharge profile between load current

levels of 1A and 4A.

Training Profiles – 1A, 2A, 4A

0 100 200 300 400 500 600 700 8002

2.5

3

3.5

4

4.5

Discharge Time (sec)

Voltage

1 Amp

2 Amp

4 Amp

Test Profiles – random loads of 1A, 2A, 4A

0 200 400 600 800 1000 1200 14002

3

4

Seconds

Voltage

Battery: B0062val

DischargeCycle No.: 2

0 200 400 600 800 1000 1200 14000

1

2

3

4

0 200 400 600 800 1000 1200 14002

3

4

Seconds

Voltage

Battery: B0062val


0 200 400 600 800 1000 1200 14000

1

2

3

4

Vo

lta

ge

Lo

ad

Cu

rre

nt

Training Profiles – 1A, 2A, 4A

0 100 200 300 400 500 600 700 8002

2.5

3

3.5

4

4.5

Discharge Time (sec)

Voltage

1 Amp

2 Amp

4 Amp

Test Profiles – random loads of 1A, 2A, 4A

0 200 400 600 800 1000 1200 14002

3

4

Seconds

Voltage

Battery: B0062val


0 200 400 600 800 1000 1200 14000

1

2

3

4

0 200 400 600 800 1000 1200 14002

3

4

Seconds

Voltage

Battery: B0062val


0 200 400 600 800 1000 1200 14000

1

2

3

4

Vo

lta

ge

Lo

ad

Cu

rre

nt


71


4

Battery performance degradation due to operational usage

also affects EoD time estimation for a particular usage

cycle. For instance, Figure 3 shows several discharge

profiles for a battery used under constant discharge loading.

It can be observed that the amount of time it takes for the

battery to discharge to the 2.7V threshold is reduced

considerably for latter cycles during the battery life. In

addition, the rate of voltage decay in the pseudo-linear

region also changes with battery age. Finally, the knee

point, signaling the beginning of the exponential voltage

decay region towards the end of discharge cycle, also

changes in location and it becomes more difficult to identify

due to reduced curvature as battery ages. These changes in

voltage profile characteristics form the basis of feature

extraction as described in the next section.

Figure 3. Constant load discharge profiles from different stages of

battery life from a single battery.

4. PROGNOSTIC APPROACH

In this section, we present our approach to predict the end of

discharge (EoD) time of the battery, denoted as . This is

the time at which the battery voltage reduces to 2.7V. The

aim here is to predict the for different discharge runs of

the battery, given (i) an incomplete discharge cycle data

until current time, and (ii) the complete (randomly

changing) future operating loading. It should be noted that

for this phase of algorithm development we assume a

perfect knowledge of future load profile for the current

discharge event. Furthermore, no partial charge and

discharge events are included in these scenarios, therefore a

charge cycle initiates only after the battery is fully

discharged. These assumptions will be relaxed in the next

phase of development as we learn more about these batteries

first in these simplistic scenarios.

4.1. Feature Extraction and Training

Recall that even though our eventual goal was to predict the

for battery discharge cycles under random loading

conditions, we train our prognostic approach using battery

discharge cycle data collected under constant loading

conditions of 1A, 2A, and 4A. As a first step, training data

were prepared by carrying out denoising of the constant

loading cycle data. Some incomplete and corrupted runs

were also removed from the training data. Once the

denoised battery discharge cycles are obtained, we observe

that the voltage versus time plots (see Figure 4) for different

discharge cycles have the same trend, and each voltage

discharge plot consists of three different and distinct

regions. The first two regions can be approximated by linear

trends followed by a third region with a sharp drop-off

curve. The first pseudo-linear region is due to instant drop in

voltage due to internal battery impedance on application of

load current. For simplicity, this impedance is approximated

by an aggregated internal resistance, which is estimated as

the ratio of the observed voltage drop and the applied load

current. It is understood that as battery degrades the internal

resistance of the battery increases, and hence an estimate of

this internal resistance can be used as a proxy for battery

SOH. This estimate of internal resistance is used in creating

the maps of how the load and SOH affect voltage profiles.

Figure 4. Illustration of features extracted from training data.

The second pseudo-linear region spans majority of the

discharge profile where voltage available across battery

terminals goes down proportionally as battery charge is

depleted. It can be observed from Figure 3 that as the

battery ages the slope of this pseudo-linear region changes

and the corresponding voltage drop is higher for a given

amount of discharge time. Furthermore, this slope is also

affected by the load current level. Hence, a mapping ( ) is

created that relates this slope (m) to changes in battery

health ( ) and load current ( ). This second

pseudo-linear region is followed by a sharp drop-off in the

voltage. We term this point as the knee point, and denote

, the knee point time to be the time at which the

discharge curve enters into steep voltage drop region. For

this work, we simplified to be the time at which the

discharge curve has a predetermined slope value, It was observed that, generally, at , the battery has

consumed approximately 90% of its available charge. The

identification of is crucial in prediction , since it is

0 500 1000 1500 2000 2500 3000 35002.5

3

3.5

4

4.5

Time (Sec)

Measure

d D

ischarg

e V

oltage (

V)

cycle #:2

cycle #:46

cycle #:126

cycle #:202

cycle #:278

cycle #:356

cycle #:433

cycle #:509

cycle #:588

0 500 1000 1500 2000 2500 3000 35002.6

2.8

3

3.2

3.4

3.6

3.8

4

4.2

Time (Sec)

Measure

d D

ischarg

e V

oltage (

V)

cycle #:2

cycle #:588tadd

ΔVPseudo Linear Region 1Initial Voltage Drop

Pseudo Linear Region 2: y = mx +c

Knee Point Locus

Voltage Drop off Region

Discharge Cut off Voltage 2.7V

Vthreshold

Voltage at Knee Point

Decreasing Capacity

tEODTime (Seconds)

Me

as

ure

d D

isc

ha

rge

Vo

lta

ge

tknee

mknee


72


5

used in determining the slope of the second pseudo-linear

region.

Given the trend of the voltage discharge plots, over the set

of all denoised battery discharge cycles under different

constant loading conditions, the following features are

extracted in order to compute the :

1. The battery SOH, which is approximated by internal

resistance, and is estimated by computing the

ratio of voltage drop and the change in load current,

, as observed in the first pseudo-linear

region of the voltage discharge plot (see Figure 4). is

the change in current when the battery is loaded and

is the corresponding voltage drop in battery

terminal voltage. is also used in proportionally

adjusting the voltage level whenever load is switched

from one value to another.

2. The slope, , of the second pseudo-linear region of the

voltage discharge plot.

3. The knee point time, , beyond which a battery is

observed to retain only about 10% of its total capacity

for a given SOH. This feature is based on empirical

observation and is found to be consistent across all

cycles at all SOH. For computational purposes this

point is identified by the time at which a corresponding

threshold voltage is reached.

4. , the additional time taken corresponding to

remaining 10% capacity discharge, which needs to be

added to in predicting ; therefore, . This allows us not to model the non-

linear behavior explicitly and just adjusting the

estimates by additive offsets computed from the

mapping .

It is observed that each of the above features depends on the

state of health of the battery and the load level.

characterizes the internal impedance of the battery and

represents battery age. Hence, we use as an

approximation for SOH. It is assumed that SOH does not

change within a given discharge cycle. Hence, given the

load, , and , we learn the following

three multidimensional mappings:

These mappings can be implemented using several different

techniques. In this work, we focus learning these mappings

using the least-squared polynomial regression, and artificial

neural network. Once these mappings are learned, the

can be predicted by using the future load profile

information.

4.2. Architecture

The data-driven prognostic approach adopted in this paper is

presented in Figure 5. The first step to this approach is the

estimation of the SOH of the battery. In our approach, we

estimate by estimating the of the battery

using voltage and current measurements, and ,

respectively, at the start of the discharge cycle. and

the future operating loading profile of the battery are then

fed into three mappings, which estimate , , and ,

which are then used for predicting the

Figure 5. Data-driven prognostics architecture.

4.3. Prediction

Recall that although mappings are created using constant

load profile data to learn various relationships, the algorithm

performance is evaluated using data from random loading

profiles. Given discharge cycle data until current time, i.e.,

the time stamped current and voltage measurements

recorded from the battery, and the knowledge of expected

(randomly changing) loading profile in future, our goal is to

make correct prediction for . Since, the training data do

not contain the information about transients that arise during

load switching, an adaptation parameter is incorporated

into the prediction scheme, which gets adapted based on

observed data and is used to adjust the values obtained from

the mappings. This allows us not having to update the entire

mappings that were built offline in training phase but still

incorporate the differences that are seen in run-time data due

to various factors not considered in the learning step.

Algorithm 1 describes our steps for predicting the for a

discharge cycle.

The algorithm takes as inputs the vector of prediction

time-points, , the vector of time-intervals, , and

a vector of future current loading values, , each

element of which corresponds to the current loading time-

intervals in . First, we initialize , our slope

adjustment multiplier. Then, we compute as

explained above. Next, for each discharge cycle, we assess

for the given and from the mapping and

extrapolate from the battery voltage measured at prediction

time, to the end of the current load level segment, i.e.,

until the next load level is switched. The threshold voltage

is also computed. If at the end of the loading cycle,

Prediction

Mapping

f1

Mapping

f2

Mapping

f3

Vth

tadd

mEstimate

SOH

Vmeas

Imeas

Future operating

loading profile

SOHest tEoD


73


6

, determine the time at which , compute

based on and , and determine , and stop. Otherwise, from the last segment, determine

the real slope, and adapt the slope adjustment

multiplier to be used for the next load segment, and the

loop is repeated until either all future load segments are

included in the prediction or a knee point is reached and a

final EoD prediction is made.

Algorithm 1: Prediction

Input:

1.

2.

3.

4.

initialize

for

if

time at which

break

else

compute

end

end

4.4. Learning Algorithms

In order to assess the contribution of data-driven learning

step in our prognostic framework we selected two

regression algorithms continuing from previous

benchmarking efforts (Goebel, Saha, & Saxena, 2008; Saha,

Goebel, & Christophersen, 2008). One of them is of very

low complexity based on linear polynomial regression and

the other represents a more sophisticated approach, i.e. an

artificial neural network (ANN). Finally, to compare the

performance a particle filter based algorithm is used, which

uses empirical models and measurement data to predict

battery EoD. These algorithms are briefly described next.

4.4.1. Polynomial Regression

For the purpose of generating the three mappings a simple

linear polynomial mapping based on least-squared

regression was employed to compare with other regression

approaches such as ANNs. As can be seen from the three

learned mappings in Figure 6, there is significant of noise in

data, which makes it difficult to learn clear relationships,

especially in cases where one-to-many relationship exists

between the input combinations and the output.

Figure 6. The three mappings based on polynomial regression.

Gray cross markers show quality of fit (computed data) using test

(measured) data.

Since no obvious reason was available for such behavior,

first order polynomials (linear models) were fit to data based

on empirical observations. The quality of fit, also shown in

Figure 6, supports this choice. Once these mappings were

built they were used to compute features for input

combinations present in test data. It must be noted that for

learning phase the input space has only three discrete values

available for . Since the load in the test scenario is a

0 1 2 3 4 52.8

2.9

3

3.1

3.2

3.3

3.4

3.5

3.6

3.7

Load Current

V

thre

shold

measured

computed

0.08 0.1 0.12 0.14 0.162.8

2.9

3

3.1

3.2

3.3

3.4

3.5

3.6

3.7

SOHest

(Approximated by Rmeas

)

Vth

reshold

measured

computed

0.08 0.1 0.12 0.14 0.16100

200

300

400

500

600

700

800

900

SOHest

(approximated by Rmeas

)

t add

measured

computed

0 1 2 3 4 5100

200

300

400

500

600

700

800

900

Load Current

t add

measured

computed

0 1 2 3 4 5-8

-7

-6

-5

-4

-3

-2

-1

0x 10

-4

Load Current

Slo

pe o

f Lin

ear

Regio

n (

m)

measured

computed

0.08 0.1 0.12 0.14 0.16-8

-7

-6

-5

-4

-3

-2

-1

0x 10

-4

SOHest

(approximated by Rmeas

)

Slo

pe o

f Lin

ear

Regio

n (

m)

measured

computed

estfm SOHBattery ,Load lOperationa1

estadd ft SOHBattery ,Load lOperationa3

estth fV SOHBattery ,Load lOperationa2


74


7

continuous variable, a linear interpolation was used to arrive

at feature value for test loads spanning between the training

loads of 1A, 2A, and 4A.

4.4.2. Artificial Neural Network Based Regression

An alternative approach to constructing the mappings

and was implemented based on artificial neural networks.

The fitting of these functions provides several complications

for the training on the neural network due to the nature of

the training data. The neural network structure was therefore

selected to obtain the simplest mapping, as close as possible

to a plane. This is done to avoid over fitting which is a

challenge imposed by the data. The data was normalized for

the training of all the mappings in order to improve the

performance of the neural network training. This

normalization consisted of subtracting the sample mean and

dividing the data by the sample standard deviation. The

standard Levenberg-Marquardt algorithm was used for the

optimization during the training process. Simple network

structures with single hidden layer with two neurons for

mapping , and a single hidden layer with one neuron was

used for and .

4.4.3. Benchmark Algorithm – Particle Filters

As part of our previous work, we have developed particle

filter-based prognostic approaches for battery health

management (Orchard, Tang, Saha, Goebel, &

Vachtsevanos, 2010; Saha & Goebel, 2009; Bhaskar Saha &

Kai Goebel, 2011) on the same data sets as used in this

work. We use the results obtained from this approach as our

comparison standard, with the hope that our data-driven

methods can perform as well as a Particle Filter-based

approach. A particle filter (PF) (Arulampalam, Maskell,

Gordon, & Clapp, 2002; Gordon, Salmond, & Smith, 1993)

is a sequential Monte Carlo method that approximates the

state probability density function (PDF) using a weighted

set of samples, called particles. The value of each particle

describes a possible system state, and its weight denotes the

likelihood of the observed measurements given this

particle’s value. As more observations are obtained, the

value of each particle in the next time step is predicted by

stochastically moving each particle to a new state using a

non-linear process model describing the evolution in time of

the system under analysis, a measurement model, a set of

available measurements, and an a priori estimate of the state

PDF. Then, the weight of each particle is updated to reflect

the likelihood of that observation given the particle’s new

state. For prognostics, the PF is used to only predict the

future values of particles based on future operating loading

profiles, and not update them for future operating loading

profiles, since future measurements are not available. In this

work, a detailed discharge model of the cells, as described

in (Bhaskar Saha & Kai Goebel, 2011), is used as the

process model for the PF. The model parameters include

double layer capacitance, the charge transfer resistance, the

Warburg impedance, and the electrolyte resistance. The

model was developed by analyzing the way the impedance

parameters change with charge depletion during the

discharge cycle.

5. RESULTS AND DISCUSSIONS

Algorithms described above were tested on data collected

from two batteries that were discharged under randomized

sequence of loads between 1A and 4A levels. For this paper

we present results from two discharge cycles from each of

the batteries chosen from an early stage of life (second and

fourth discharge cycles). The results obtained from all four

cases were similar in characteristics, and only one set is

presented below for conciseness. The rest of the three sets

are included in the appendix for reference. Results are

evaluated based on alpha-lambda prognostic metric as

described in (Saxena et al., 2010).

Figure 7. Alpha-Lambda metric plot for comparing algorithmic

performance.

Table 1. Prediction results comparing data-driven prediction

approach based on two different learning algorithms and an

empirical model based prediction.

tP RUL*

Particle

Filter

ANN

Regression

Polynomial

Regression

RUL Error RUL Error RUL Error

20 2673 2750 77 2126 -547 2606 -67

247 2446 2511 65 1899 -547 2330 -116

475 2218 2287 69 2344 126 2253 35

703 1990 2067 77 2116 126 2972 982

930 1763 1853 90 2568 805 3026 1263

1157 1536 1589 53 1063 -473 897 -639

1385 1308 1365 57 1147 -161 1204 -104

1612 1081 1151 70 1207 126 1116 35

1840 853 993 80 979 126 888 35

2068 625 735 110 464 -161 369 -256

2296 397 505 108 523 126 432 35


75


8

It can be observed from Figure 7 (and numerical data

provided in Table 1) that the data-driven method based on

two different mappings performs in similar fashion, but the

performance is not as good as the model based approach.

Furthermore, given the nature of the data (see Figure 6) the

method based on ANN mapping performs poorer. It can be

explained based on the fact that it has a harder time learning

simple relationship compared to polynomial regression. On

further analysis several potential issues were identified that

may have contributed to the poor performance of this data-

driven approach:

- As evident from Figure 6, feature data from the

measurements are noisy and in the absence of

suitable denoising scheme learning meaningful

relationships may be difficult. Especially in the

case of randomized loading profiles the effect of

noise may be non-linear that may not be captured

by interpolating observations from three constant

loading scenarios

- Constant loading scenarios lack the information

about the effects of transients that are bound to be

present in variable loading case during the times

when load is switched from one level to another.

Such information is crucial for accurate predictions

- Features extraction involves linearization of several

non-linear regions and hence the performance is

sensitive to choices made such as definition of the

knee point, definition of slope m, etc. These

choices are purely observation based and require a

more thorough sensitivity analysis, which requires

considerable effort as part of data-driven solution.

- Quality of mapping learned from data lies at the

heart of data-driven prediction approach; however

there is no direct provision of updating the

mapping as new data comes in. This translates into

a problem especially for a situation where training

data are significantly different than test data and

are missing some important knowledge.

6. CONCLUSIONS

This paper presented the results and lessons learned from

implementing a data-driven prediction approach for variable

loading scenario based on data acquired from controlled lab

environment for constant loading scenarios. It was observed

that such methods may not always lead to good performance

when applied to realistic datasets. While the performance

obtained in this effort is not generalized to all data-driven

methods by any means, the lessons learned are presented for

the research community to avoid potential pitfalls that one

may run into. This effort also establishes a preliminary

baseline for performance on the battery aging datasets

available from NASA’s prognostic dataset repository, which

will help other approaches in comparative evaluation and

successive improvements in performance.

ACKNOWLEDGEMENT

The authors would like to acknowledge the support from

System wide Safety Assurance Technologies (SSAT)

project under NASA’s Aviation Safety program,

Aeronautics Research Mission Directorate (ARMD).

NOMENCLATURE

slope of second linear segment of discharge profile

estimated state of health

measured battery load current

measured internal resistance of battery

threshold voltage at which end of life is reached

time till end of discharge

till till end of discharge from knee point

real slope of second linear segment of discharge

profile

slope adjustment multiplier

vector of prediction time-points

vector of time-intervals

vector of future current loading values

REFERENCES

Arulampalam, S., Maskell, S., Gordon, N. J., & Clapp, T.

(2002). A Tutorial on Particle Filters for On-line

Non-Linear/Non-Gaussian Bayesian Tracking.

IEEE Tran. On Signal Processing, 50(2), 174-188.

Goebel, K., Saha, B., & Saxena, A. (2008). A Comparison

of Three Data-driven Techniques for Prognostics.

Paper presented at the MFPT 2008.

Gordon, N. J., Salmond, D. J., & Smith, A. F. M. (1993).

Novel Approach to Nonlinear Non-Gaussian

Bayesian State Estimation. Paper presented at the

IEE Radar and Signal Processing.

Hu, C., Youn, B. D., & Chung, J. (2012). A Multiscale

Framework with Extended Kalman Filter for

Lithium-Ion Battery SOC and Capacity Estimation.

Applied Energy, 92, 694-704.

Jaworski, R. K. (1999). Statistical Parameters Model for

Predicting Time to Failure of Telecommunications

Batteries. Paper presented at the 21st International

Telecommunications Energy (INTELEC'99).

Lee, S., Kim, J., & Lee, J. (2007). The state and parameter

estimation of an Li-ion battery using a new OCV-

SOC concept. Paper presented at the IEEE Power

Electronics Specialists Conference (PESC'07).

Lee, S., Kim, J., Lee, J., & Cho, B. H. (2011).

Discrimination of Li-ion batteries based on

Hamming network using discharging-charging

voltage pattern recognition for improved state-of-

charge estimation. Journal of Power Sources,

196(4), 2227-2240.

Luo, W., Lv, C., Wang, L., & Liu, C. (2011). Study on

Impedance Model of Li-ion Battery. Paper


76


9

presented at the 6th IEEE Conference on Industrial

Electronics and Applications, Beijing.

Meissner, E., & Richter, G. ( 2003). Battery Monitoring and

Electrical Energy Management Precondition for

Future Vehicle Electric Power Systems. Journal of

Power Sources, 116(1-2), 19.

NASA. (2007). Prognostics Data Repository. from NASA

Ames Research Center

http://ti.arc.nasa.gov/project/prognostic-data-

repository

Orchard, M., Silva, J., & Tang, L. (2011, September 25th-

29th). A Probabilistic Approach for Online Model-

based Estimation of SOH/SOC and use profile

characterization for Li-Ion Batteries. Paper

presented at the Battery Management Workshop,

Annual Conference of the Prognostics and Health

Management Society 2011, Montreal, QB, Canada.

Orchard, M., Tang, L., Saha, B., Goebel, K., &

Vachtsevanos, G. (2010). Risk-Sensitive Particle-

Filtering-based Prognosis Framework for

Estimation of Remaining Useful Life in Energy

Storage Devices. Studies in Informatics and

Control, 19(3), 209-218.

Orchard, M., Tang, L., & Vachtsevanos, G. (2011,

September 25th-29th ). A Combined Anomaly

Detection and Failure Prognosis Approach for

Estimation of Remaining Useful Life in Energy

Storage Devices. Paper presented at the Annual

Conference of the Prognostics and Health

Management Society 2011, Montreal, QB, Canada.

.

Rufus, F., Lee, S., & Thakker, A. (2008). Health Monitoring

Algorithms for Space Application Batteries. Paper

presented at the International Conference on

Prognostics and Health Management, Denver, CO.

Saha, B., & Goebel, K. (2009). Modeling Li-ion Battery

Capacity Depletion in a Particle Filtering

Framework. Paper presented at the Annual

Conference of the PHM Society, San Diego, CA.

Saha, B., & Goebel, K. (2011). Battery Data Set. from

NASA Ames, Moffett Field, CA

http://ti.arc.nasa.gov/project/prognostic-data-

repository

Saha, B., & Goebel, K. (2011). Model Adaptation for

Prognostics in a Particle Filtering Framework.

International Journal of Prognostics and Health

Management, 2(1), 10.

Saha, B., Goebel, K., & Christophersen, J. (2008).

Comparison of Prognostic Algorithms for

Estimating Remaining Useful Life of Batteries.

Transactions of the Royal UK Institute on

Measurement & Control(special issue on

Intelligent Fault Diagnosis & Prognosis for

Engineering Systems), 293-308.

Saha, B., Goebel, K., Poll, S., & Christophersen, J. (2009).

Prognostics Methods for Battery Health

Monitoring Using a Bayesian Framework. IEEE

Transactions on Instrumentation and

Measurement, 58(2), 291-296.

Santhanagopalan, S., & White, R. E. (2010). State of charge

estimation using an unscented filter for high power

lithium ion cells. International Journal of Energy

Research, 34(2), 152-163.

Saxena, A., Celaya, J., Balaban, E., Goebel, K., Saha, B.,

Saha, S., & Schwabacher, M. (2008). Metrics for

Evaluating Performance of Prognostics

Techniques. Paper presented at the 1st International

Conference on Prognostics and Health

Management (PHM08), Denver CO.

Saxena, A., Celaya, J., Saha, B., Saha, S., & Goebel, K.

(2010). Metrics for Offline Evaluation of

Prognostic Performance. International Journal of

Prognostics and Health Management, 1(1), 20.

Schwabacher, M. (2005). A Survey of Data Driven

Prognostics. Paper presented at the AIAA

Infotech@Aerospace Conference, Arlington, VA.

BIOGRAPHIES

Abhinav Saxena is a Research Scientist with SGT Inc. at

the Prognostics Center of Excellence NASA Ames Research

Center, Moffett Field CA. His research focus lies in

developing and evaluating prognostic algorithms for

engineering systems using soft computing techniques. He is

a PhD in Electrical and Computer Engineering from

Georgia Institute of Technology, Atlanta. He earned his

B.Tech in 2001 from Indian Institute of Technology (IIT)

Delhi, and Masters Degree in 2003 from Georgia Tech.

Abhinav has been a GM manufacturing scholar and is also a

member of IEEE, AAAI and ASME.

José R. Celaya is a research scientist with SGT Inc. at the

Prognostics Center of Excellence, NASA Ames Research

Center. He received a Ph.D. degree in Decision Sciences

and Engineering Systems in 2008, a M. E. degree in

Operations Research and Statistics in 2008, a M. S. degree

in Electrical Engineering in 2003, all from Rensselaer

Polytechnic Institute, Troy New York; and a B. S. in

Cybernetics Engineering in 2001 from CETYS University,

México.

Indranil Roychoudhury received the B.E. (Hons.) degree

in Electrical and Electronics Engineering from Birla

Institute of Technology and Science, Pilani, Rajasthan,

India in 2004, and the M.S. and Ph.D. degrees in

Computer Science from Vanderbilt University, Nashville,

Tennessee, USA, in 2006 and 2009, respectively. Since

August 2009, he has been with SGT, Inc., at NASA

Ames Research Center as a Computer Scientist. His

research interests include hybrid systems modeling, model-

based diagnostics and prognostics, distributed diagnostics


77


10

and prognostics, and Bayesian diagnostics of complex

physical systems. He is a member of IEEE.

Sankalita Saha was a research scientist with Mission

Critical Technologies at the Prognostics Center of

Excellence, NASA Ames Research Center during this effort.

She received the M.S. and PhD. degrees in Electrical

Engineering from University of Maryland, College Park in

2007. Prior to that she obtained her B.Tech (Bachelor of

Technology) degree in Electronics and Electrical

Communications Engineering from the Indian Institute of

Technology, Kharagpur in 2002.

Bhaskar Saha received his Ph.D. from the School of

Electrical and Computer Engineering at Georgia Institute of

Technology, Atlanta, GA, USA in 2008. He received his

M.S. also from the same school and his B. Tech. (Bachelor

of Technology) degree from the Department of Electrical

Engineering, Indian Institute of Technology, Kharagpur,

India. Before joining PARC in 2011 he was a Research

Scientist with Mission Critical Technologies at the

Prognostics Center of Excellence, NASA Ames Research

Center, where his research focused on applying various

classification, regression and state estimation techniques for

predicting remaining useful life of systems and their

components, as well as developing hardware-in-the-loop

testbeds and prognostic metrics to evaluate their

performance. He has been an IEEE member since 2008 and

has published several papers on these topics

Kai Goebel received the degree of Diplom-Ingenieur from

the Technische Universitt Mnchen, Germany in 1990. He

received the M.S. and Ph.D. from the University of

California at Berkeley in 1993 and 1996, respectively. Dr.

Goebel is a senior scientist at NASA Ames Research Center

where he leads the Diagnostics and Prognostics groups in

the Intelligent Systems division. In addition, he directs the

Prognostics Center of Excellence and he is the technical

lead for Prognostics and Decision Making of NASAs

System-wide Safety and Assurance Technologies Program.

He worked at General Electrics Corporate Research Center

in Niskayuna, NY from 1997 to 2006 as a senior research

scientist. He has carried out applied research in the areas of

artificial intelligence, soft computing, and information

fusion. His research interest lies in advancing these

techniques for real time monitoring, diagnostics, and

prognostics. He holds 15 patents and has published more

than 200 papers in the area of systems health management.


78


11

APPENDIX

Table 2. Results for validation battery 61, cycle 4.

tP RUL*

Particle

Filters

ANN

Regression

Polynomial

Regression


20 2793 2763 -30 2297 -496 2615 -178

247 2566 2510 -56 2698 132 2714 148

475 2338 2286 -52 1946 -392 2736 398

703 2110 2008 -102 990 -1120 861 -1249

930 1883 1796 -87 944 -939 372 -1511

1157 1656 1553 -103 481 -1175 572 -1084

1385 1428 1325 -103 855 -573 871 -557

1612 1201 1114 -87 857 -344 813 -388

1840 973 887 -86 641 -332 531 -442

2068 745 660 -85 877 132 893 148

2296 517 432 -85 649 132 665 148


tP RUL*

Particle

Filters

ANN

Regression

Polynomial

Regression


20 2597 1897 -700 2876 279 2870 273

247 2370 2313 -57 1463 -907 1663 -707

475 2142 2188 46 938 -1204 1465 -677

703 1914 1981 67 777 -1137 1058 -856

930 1687 1708 21 812 -875 845 -842

1157 1460 1468 8 1066 -394 1060 -400

1385 1232 1321 89 838 -394 1505 273

1612 1005 1094 89 1003 -2 1278 273

1840 777 909 132 1046 269 1050 273

2068 549 732 183 686 137 817 268

2296 321 494 173 600 279 594 273


tP RUL*

Particle

Filters

ANN

Regression

Polynomial

Regression


20 2519 2386 -133 1897 -622 2546 27

247 2292 2358 66 1405 -887 1740 -552

475 2064 2168 104 1560 -504 1697 -367

703 1836 1582 -254 1843 7 1863 27

930 1609 1580 -29 1616 7 1636 27

1157 1382 1473 91 1115 -267 1165 -217

1385 1154 1269 115 881 -273 676 -478

1612 927 1030 103 934 7 954 27

1840 699 616 -83 706 7 726 27

2068 471 598 127 478 7 498 27

2296 243 357 114 0 -243 0 -243


79

Diagnostics Driven PHM

The Balanced Solution

Jim Lauffer

DSI International, Inc.

Orange, California 92867, USA

[email protected]

ABSTRACT

Much effort has been made to develop technologies and

define metrics for Prognostics Health Management

(PHM). The problem is that most of this effort has focused

on theoretical and high risk concepts of Prognostics

performance while ignoring the real needs in “System

Health Management”. In the wake of this technological

attention, the importance of true Integrated Systems Health

Management (ISHM) has been masked by the focus on

single failure mode physics of failure solutions. The critical

PHM metrics, derived from Integrated Systems Diagnostics

Design (ISDD) have mostly been ignored. These critical

metrics include Reliability, Safety, Testability, and System

Maintainability & Sustainment, as well as the impact of

prognostics performance on Systems Diagnostics. A key

point to be made is that the ISDD process is much larger

than just developing metrics. ISDD results in a well-

designed system that meets true health management needs,

as well as significantly lowering development costs, and the

cost of ownership. Another point that needs to be made is

that the core of ISDD is a proven and highly effective

analysis solution in Model Based Diagnostics. This paper

discusses the approach of using Model Based Diagnostics in

the ISDD process to determine the best balance of the

Health Management design. It will be shown how the

impact and effectiveness of prognostics as integrated with

the ISDD process provides true value to performance and

cost avoidance.

1. THE SKEWED PHM TECHNOLOGIES

New York University mathematics physicist, Alan Sokal,

submitted an article on current physics and mathematics

based around quantum mechanics / chaos theory (Sokal,

1994&1995). Sokal’s article was republished by top

scientist in 1996 citing Sokal’s article as a credit to

scientific research. Soon after Sokal explained in a new

article that his publication had been salted with nonsense,

and in his opinion was accepted because: (a) it sounded

good and (b) it flattered the editor’s ideological

preconceptions.

It turns out that Sokal’s Hoax served a public purpose to

attract attention to what Sokal saw as a decline of standards

of rigor in the academic community. Today, this

philosophy of Sokal’s Hoax can easily be applied to

Government, Industry and Academia on the subject of

Prognostics Health Management. Far too many

technologists and business managers fall into the “hoax”

that systems can be prognosed to predict, within a known

Remaining Useful Life (RUL) parametric, for any and all

failures; then to go on to promote this RUL prediction of

precluding all failures that will prevent system operational

failures and enhance sustainment.

Back about thirty years ago this author had his first

experience with prognostics based on signature analysis. As

a result of this operation, the U.S. Navy looked into using

signature analyses from the ship’s noise propagation to

predict a signature shift that could possibly be leading to a

failure. After much trial and error on this concept, and after

unnecessary consumption of spare parts and maintenance

labor hours, it was decided that this form of “prognostics”

was not working.

A major U.S. project was based on PHM being the core to

the prevention of catastrophic failures and system aborts

through prognostics. This PHM system would also provide

the operational data needed to drive an Advanced Logistics

Information System (Gill, 2003). After investing untold

millions in U.S. dollars, it is recognized that the planned

PHM path must be modified. The realization that you

cannot prognose an entire system is finally coming into

focus. The idea of using the proven technology of Model

Based Diagnostics was discarded early in the program due

to the same philosophy Sokal exposed. By the way, the

_____________________

Lauffer et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which




80


2

original definition of PHM was Diagnostics and Prognostics

Health Management. It did not take long for diagnostics to

be displaced by the more “exciting” prognostics exclusivity.

If you have read this far, you are probably thinking that this

author is anti-technology development and is only trying to

promote an old method for determining the Health

Management of a system. This is far from the intent of this

paper! This author has been a proponent of advanced

technology development from the 50s during the early

transition from vacuum tubes (valves as some say) to solid

state technology, and on to today’s prognostic technologies.

He has attended courses at Georgia Tech and has worked

with prognostics professionals. From all of this, it has to be

said that prognostics plays a very powerful role in PHM and

is the way of the future. With that said, it is also apparent

that prognostics is not a “Systems” health management

technology. It is limited to selected failure modes that must

not be allowed to fail due to system criticality. In depth

physics of failure analysis, proper sensors, and precise

processing needs to be in place to determine RUL when a

single failure mode approaches a critical state. Keep in mind

that the focus is on a single failure mode, and even into the

molecular structure of this single mode. Even then, this

processed single failure mode effect must be observable at

the system level to be considered a functional component of

PHM.

This is in contrast to a typical operational platform with tens

of thousands, or even hundreds of thousands of failure

modes. It is obvious that the prognostics technologies are far

from capable of performing system level PHM. There have

been attempts in AI, Bayesian Networks, Boolean Logic,

and others to perform this PHM System analysis. But, these

have been shown to be ineffective, at a very high cost and

risk.

As stated, there is a need for integrated prognostics that can

map a prognosed event to overall System PHM. Investment

into prognostics must be accountable, not just bought in to

satisfy study funding. Thousands of pages have been written

on prognostics but those studies have had a difficult, if not

impossible, time performing in a fielded system, let alone

contribute any value to design influence.

Over the last decade or so, the demand for increased

prognostics within complex, critical systems has resulted

not only in changes to how these systems are developed, but

also to the way in which designs are analyzed as they are

developed. In particular, system analysis practices have

been moving away from true System Health Management

values, such as reliability, testability, maintainability,

sustainment and the critical parametric today - Cost. Some

critical systems have focused on prognostics details while,

to the most extent, ignored the ISDD process. System

designs now either pursue high cost and risk custom

solutions to focus on prognostics, incorporate prognostic

details into other calculations, or ignore prognostics

altogether. This issue is amplified by the fact that much of

the value in reliability and testability analysis can best be

realized when design feedback is available relatively early

in the development cycle. On the other hand, prognostic

development and the evaluation of prognostic performance

take years of operational time to obtain any metrics of value.

It is unlikely that information derived from formal

prognostic performance metrics (Saxena, et al, 2010) can be

incorporated into systems engineering analyses to profitably

impact system development and decision-making. At the

same time, un-validated prognostics can lead to low

Availability and high sustainment cost due to false

removals. This results in notoriously time-consuming and

costly prognostic performance

As an alternative, some projects have implemented custom

solutions, modifying design-time engineering analyses to

account for the expected impact of prognostics concurrently

under development. There is, however, no standardized or

officially sanctioned approach to accounting for prognostics

performance. For each project, systems analysts must ask a

series of questions; for example, diagnostic analysts must

decide whether fault detection & isolation metrics should

take full or partial credit for prognosed failures, or whether

testability analysis can be constrained to cover only the non-

prognosed portion of the design. In either case, should

prognostic horizon and/or accuracy be taken into

consideration?

If so, then how is the end user or maintainer expected to

respond to prognostic notifications without questioning

them? Will there be cases in which some sort of

confirmation will be required before a maintenance action is

performed? Then the key question is, should diagnostic

analysis be consulted when determining the optimal areas in

which to develop prognostic measurements or will only

criticality considerations be involved in the selection of

prognostic candidates?

The root of these, and other related questions is the lack of

realistic and cost effective requirements, and the lack of

systems diagnostics understanding. So, what is the solution

to effective and affordable PHM? The answer is obvious –

Model Based Diagnostics. This Model Based Diagnostics is

a proven technology that has been in use for decades. In the

past 20 plus years it has come to be recognized as the

systems engineering tool of choice throughout industry.

Without going off track on the balancing of prognostics and

diagnostics in PHM development, it needs to be mentioned

that there is once again a push for something new in the

field of diagnostics analysis. There has been talk of Model

Based Diagnostics falling out of “fashion” within the same

community that has proliferated prognostics. Also, Model

Based Diagnostics has received some bad press from entry

level tools whose use has been attempted on projects where

the tools failed to perform. Unfortunately, these unproven

tools resulted in high costs with no acceptable results.


81


3

These failures to perform lead the technology community to

downplay the use of Model Based Diagnostics and they

became vulnerable to high cost and high risk solutions.

They are told Model Based Diagnostics is considered

obsolete due to construed higher order mathematical

solutions. The issue is these “non-model based” solutions

have significant problems with development skill needs,

high cost, lack of system integration, and are limited to

small scale analyses. Just as prognostics entered as the

“new and improved” health management solution, other

analytical solutions are continuing to be pushed into the new

wave of thinking without the understanding of a systems

engineering approach.

One such solution attempt has been tried over the years in

several research communities and this is based on Bayesian

Networks. As with prognostics, a Bayesian Network

requires extensive development and cannot begin until the

design is well defined. Then, if there is a design change, the

analysis needs to start all over again. Even if a network can

be completed, it is limited to smaller systems, cannot

provide knowledge to the Logistics sustainment solution,

and still requires years of learning to “fine tune” the results.

2. DIAGNOSTICS DRIVEN PHM

Now that this author has ripped “stand alone” attempts at

prognostic solutions, the following discussion focuses on

effective diagnostics driven PHM based on Model Based

Diagnostics. This ISDD process is centered on a proven tool

suite and process that brings the system design into an

optimized PHM solution. This solution provides the

confidence needed for fault detection and isolation at the

system level that includes the impact of prognostics on

diagnostics. This ISDD process identifies the candidates

needed for an effective prognostics analysis. It also provides

the parametrics used for and Operational and Support

Simulation. This simulation capability is shown in section

5.

For the system design to be optimized for effective health

management and sustainment, the diagnostics design

process needs to begin early in the design process. This is

something prognostics cannot do. The diagnostics analysis

results in a selection of candidates for prognostics analysis.

See Figure 1 for this diagnostics informed prognostics

analysis process.

As emphasized, for optimum results in design influence, the

ISDD process needs to begin at the start of the project’s

design phase. This is where PHM and sustainment must be

considered to be effective and affordable. Along with

testability requirements (the probability of fault detection,

isolation to a defined ambiguity set, and false alarm

constraints), PHM and logistics requirements must be

understood.

Figure 1. Diagnostics selection of prognostics candidates

With this in mind, and to keep this paper on track, the

following discussion focuses on system prognostics

requirements as driven by the ISDD process.

Figure 1 shows these prognostic requirements being defined

at the beginning of the diagnostics engineering

development. This is a critical point in PHM design and is

where the customer typically falls short in requirements

definitions. Very few customer project managers

understand prognostics well enough to flow down cohesive

prognostic requirements. To be effective, these initial

prognostics requirements now need to be included in the

diagnostics test definitions in the form of prognostic tests.

These prognostic parametrics are defined in section 3.1.

As the diagnostics analysis is developed, prognostic

candidates are developed as part of the optimized

diagnostics results. The prognostics candidates are

prioritized based on the failure mode severity and the failure

rate. The primary candidates are those failure modes that

cannot be allowed to progress to failure, and failure

mitigation through functional redundancy is not practical or

possible. An example of a prognostics candidate list derived

from the diagnostics analysis is shown in Figure 2. This

example is not intended to be an eye test but is used to show

the format of a typical candidate list. Note that in the

example, two Loss of Life severities are listed below the

Loss of Equipment candidates. This is not to suggest Loss

of Life is less important, it is just listed based on the lower

failure rates. In an actual prognostics assessment, these two

candidates would certainly be considered important. But, at

the same time, if their Loss of Life failure probability is

very low, the prognostics for this failure mode may not be

cost effective.


82


4

Figure 2. Prognostic Candidate List from Diagnostics

Continuing in the diagnostics engineering process, the

diagnostics analysis results, along with selected prognostics

tests, are fed into the product design to support the PHM

design solution. This provides the all-important design trade

study process that builds a well-balanced, diagnostics

driven, PHM solution. Later in the process it is shown

where prognostics parameters may be available to further

optimize the diagnostics analysis.

3. SYSTEM PROGNOSTIC REQUIREMENTS

The following discussion focuses on the approach to

incorporating prognostic considerations into areas such as

reliability, testability, maintainability and sustainment

analyses. This is accomplished by representing expected

prognostic behavior in terms derived from system

prognostic requirements. This will show how these

parameters can be used to define prognostic behavior within

a diagnostic engineering process. Finally, this will show

how these prognostic definitions can be used to modify the

results of standard measures of diagnostic effectiveness

using fault detection and isolation metrics defined within

IEEE Standard 1522-2004. This also looks into informed

simulation-based approaches for assessing the impact of

different prognostic, diagnostic and maintenance strategies.

The following definition of requirements, parameters and

example are based on a paper by Eric Gould who has

developed advanced prognostic influence capabilities in the

DSI eXpress Diagnostics Engineering tool (Gould, 2011).

This previous paper is being paraphrased in some sections to

provide specific information needed to understand how

prognostics is used in the ISDD process.

Even though the academic technology of system prognostics

has been around with study support since the 1990s, the

understanding of prognostics requirements are relatively

new to design development projects. This is compared to

system diagnostic and testability requirements which have

been around since the 1980s. It is therefore not surprising

that there has been a fair amount of variance in the

definitions of desired prognostic capabilities from one

project to another.

For effective prognostics requirements to be defined, a

process for the derivation of these requirements must be

understood and followed. Aspects covered by these

qualitative descriptions include 1) whether the prognostics

shall be embedded in the system, 2) whether prognostics

shall be automated or initiated, 3) whether prognostics shall

be developed solely for the determination of mission-

readiness or also for the optimization of Logistics, 4)

whether prognostics results shall be reported to the crew,

maintenance technicians, and/or mission planners, and 5)

whether prognostics shall consist solely of condition-based

observations of failure precursors or whether it can also

contain predictions based on the failure rates and stress

histories of individual components. Although information of

this type is essential for describing the prognostic capability

required for each project, it is not relevant to the following

discussion. In the example shown in section 3.2, the

requirements have been trimmed down to include only the

information needed for a quantitative evaluation of a

system’s prognostics capability, and the impact of

prognostics on systems diagnostics in PHM.

3.1. Prognostic Parameters

With the quantitative aspects of the requirement broken

down into individual parameters, it was determined that five

basic parameters were sufficient for describing any of the

sample requirement statements:

Scope – the set of possible failures to which a given

requirement applies. Common scopes include mission

critical failures, essential function failures, or failures that

necessitate a system abort.

Category – the set of prognoses to which a given

requirement applies, such as embedded or sensor-based

prognoses.

Horizon – the time before failure that prognosis must occur.

This can either be a fixed value (e.g., 72 hours prior to

failure) or a calculated value, based on both the desired

mission length and the corrective action time associated

with each failure.

Coverage – the percentage of failures in the specified scope

that must be prognosed. This parameter can either be failure

probability-weighted (so that there is greater credit for

failures that occur more frequently) or non-weighted (so that

all failures in the specified scope are counted equally).

Accuracy – the desired confidence/correctness of the overall

prognostic capability (typically defined as a percentage of

accuracy). In some requirement statements, Accuracy is

bundled with Coverage as a single percentage of failures

prognosed.

3.2. An Example of Prognostic Requirements

The following example examines the individual prognostic

requirements, parsing each statement into the related

parameters and discussing any interpretive peculiarities. All

threshold/objective parameters have been simplified so that

they are expressed as a single goal.


83


5

1) Requirement Example

Prognostics shall predict at least 80% of the mission critical

failures 96 hours in advance of occurrence with 90%

probability.

Scope: Mission Critical Failures

Horizon: 96 hours

Coverage: 80%

Accuracy: 90%

This prognostic requirements statement has four parameters

that collectively specify the expected behavior of the

prognostics. Because it reads like a performance

requirement — one that specifies the expected performance

of a fielded system, greater credit should be given to

prognosed failures that occur more frequently than to those

that occur relatively infrequently. So, when calculated as an

engineering metric, the prognostic coverage should be

weighted by the failure probability of each individual

failure. The overall coverage can thus be calculated by

summing the failure rates of the failures in the scope that

can be prognosed, divided by the sum of the failure rates for

all failures in the scope.

4. PROGNOSTIC DEFINITIONS

Now let’s take a look at how prognostic definitions can be

defined within a proven diagnostic engineering tool. There

are several reasons why support for prognostics should be

added to tools that are used primarily for the creation,

assessment and optimization of system diagnostics. First of

all, if the tool has been designed for system-level diagnostic

analysis, then it already has the infrastructure in place to

perform an analysis of system-level prognostic performance.

Data from individual prognostic definitions is compiled

across the entire system to produce overall measures of

prognostic effectiveness— measures that can be easily

compared to system prognostic requirements to determine

contract compliance.

A second (and perhaps more significant) advantage to

representing prognostic measurements within a Diagnostic

Engineering tool is that the Reliability, Testability, and

Maintainability evaluations performed within the tool will

be able to reflect the expected performance of systems for

which mission readiness is assured using prognostics.

Moreover, diagnostic procedures developed within the tool

can be optimized based on the assumption that prognostics

will be employed based on real needs.

For example, prior to developing prognostic sensor and

algorithm requirements, an analysis of the system can be

used to determine the set of failures for which prognosis is

most desirable. This takes into consideration not only the

criticality and frequency of failures, but also how

successfully the system can diagnose and remediate the

failures without prognostics. Later, if the bottom line

changes and you need to reconsider the value of developing

some of the more expensive prognostic sensing and

algorithms, you can easily reevaluate the PHM performance

that would be achieved if the system were to not have this

capability.

A third advantage of adding prognostic definitions to a

diagnostic engineering tool’s model or database is that this

information can be easily exported for analysis within an

external tool. For example, simulation-based case studies

can be performed using different health management

approaches. This will allow PHM analysts to evaluate

different combinations of diagnostics, prognostics and

preventative maintenance to determine which combinations

are most effective, not only from the perspectives of

availability or mission readiness, but also sustainment and

cost effectiveness. Section 5 describes some of this

simulation capability.

4.1. Tests and Prognoses

In proven and accepted model-based diagnostic engineering

tools, test definitions are used to represent diagnostic

knowledge. To be effective, each individual test definition

must specify the coverage of a corresponding fielded

operational test or measurement. This coverage identifies

the specific functions or failure modes that should be

exonerated (removed from suspicion) or indicted (called

into suspicion) when that test passes or fails. Tests are

organized into different test sets so that they can be easily

selected as groups to support different diagnostic case

studies. Examples given relate to eXpress, DSI

International’s Diagnostics Modeling and Analysis tool, and

to DSI’s STAGE Operations and Support Simulation tool.

Prognostic measurements can be represented using a special

type of test definition. This is basically a test definition to

which prognostic parameters have been attached. The

coverage for each prognosis is represented the same way as

it would be for a diagnostic test; the only difference being

that the coverage now represents the specific functions or

failure modes for which failures can be predicted using

prognostics. As with diagnostic tests, prognostic

measurements can also be organized into test sets. When a

project has prognostic requirements that utilize the Category

parameter, the individual measurements should be grouped

into sets by category. The analysis can then be constrained

by simply selecting the sets that correspond to the desired

prognostic categories.

4.2. Prognostic Terms

For each prognostic definition, the analyst must specify one

or more Horizons, each accompanied by three variables—

Confidence, Correctness and Accuracy—that collectively

describe the expected behavior of the given prognostic

measurement at that Horizon. See Figure 3 for an example

of Prognostic Settings in eXpress with single horizon.


84


6

Figure 3. Prognostic Settings in eXpress with single horizon

The value of the specified Horizon is similar to the Horizon

parameter within a prognostic requirement; it represents a

time interval before failure that the given prognosis might

occur. The Confidence represents the likelihood that the

given prognosis will predict the covered failure(s) at or

before the specified Horizon. It is expected that Confidence

increases as the Horizon decreases; in other words, that

predictions become more confident as a prediction

approaches the time of failure.

The Correctness variable is used to represent the expected

percentage of prognoses that are correct; that is, not too

early. By default, the Correctness setting affects neither the

prognostic nor diagnostic analysis performed using that

measurement. The Correctness value, however, can still be

used to categorize prognoses within a simulation-based

assessment of a proposed PHM approach. Note that

excessively early prognoses leads to false aborts and wasted

maintenance cost and time.

The calculated Accuracy value corresponds to the Accuracy

parameter within a prognostic requirement. Unlike the other

two values used to describe a given Horizon, Confidence

and Correctness, the Accuracy variable is not defined by the

analyst, but rather calculated automatically by the analysis.

A prognostic condition that must be addressed is the need

for corrective action to be performed only for prognoses

verified to be correct. This is the case when a given

prognosis is not only independently verifiable, but will be

verified before corrective action is performed. As an

example, think of the brake pads on an automobile. As the

pads wear past a given point, they begin to squeal when the

breaks are applied. This is an intentional design

characteristic that allows the owner of the car to identify

when the pads need to be replaced. This relates to the

squealing of the brake pad as a condition-based prognosis of

a pending failure. Now, imagine that, when your brakes start

to squeal you inspect the pads and see that there is plenty of

life left—the squeal came too early. Would you still replace

the pads?

Figure 4. Accuracy calculated using both Confidence and

Correctness

From a purely realistic standpoint, the Accuracy of your

prognosis would be equal to your Confidence that prognosis

would occur prior to failure. If, however, if you only replace

the pads when they have truly worn down (when the

prognosis was correct) then the accuracy of your prognosis

must be adjusted down to account the possibility of these

false squeals.

So, when this prognostic condition is selected in the

analysis, the calculated Accuracy is equal to the product of

the Confidence and Correctness percentages. See Figure 4

for an example of Accuracy calculations. Accuracy then

represents the likelihood that the prognosis occurs early

enough (Confidence), but not too early (Correctness).

Of course, the real value of incorporating prognostics into a

diagnostic engineering model is not so much to facilitate the

prognostic analysis itself as it is to develop, assess and

optimize systems diagnostics capability This is based on the

assumption that a given level of prognosis can and will be

achieved.

5. SIMULATION OF PROGNOSTIC IMPACT ON DIAGNOSTICS

Figure 5 shows a section of an automotive braking system

that has been modeled for diagnostics analysis. The pink

highlighted items (1, Brake Pads, 2, Tires) are identified for

prognostic testing. The diagnostic results from this analysis

were exported in an XML schema (DiagML) to be used for

PHM software development and for use in other tools. One

of these tools, DSI STAGE, takes the analysis results and

performs a Monte Carlo simulation using developed

calculations for specific simulation results. Some of these

results are presented below to show the diagnostics behavior

for systems that are to be supported using selected

prognostics derived from the analysis. Note that the

simulation graphs shown are for representation of example

analyses. Do to scaling of the graphs, the scale legends are

not legible and are for reference only. The typical

simulation time is 4000 hours of brake operational use.


85


7

Figure 5. Section of eXpress diagnostics model showing

targeted prognostic candidates

This capability of analyzing prognostic performance as part

of diagnostics in PHM provides PHM optimization based on

both requirements and constraints. Through simulation, the

overall PHM solution is evaluated based on how well PHM

meets system requirements and how well it can be

implemented within cost constraints. Some typical

simulation calculations include: Prognostic Effectiveness,

Fault Detection and Isolation, Diagnostic False Alarms,

Critical Failures, System Aborts, Mission Success, Mean

Time Between Failure, Mean Time to Repair, System

Availability, Development Costs, Sustainment Costs, and

Total Cost of Ownership, plus many more to meet analysis

needs.

The following charts are from simulation runs based on the

diagnostics results from the model shown in Figure 4. The

simulation was run with 500 iterations of an operational

time of 4000 hours. The simulation was randomly seeded.

The calculations used where: Likelihood of Critical Failures

Over Time (progressive), Critical Failures Prognosed Over

Time (number), System Aborts Over Time (number),

Critical Failure Prognosed per Failure Entity (number),

Mean Time Between Prognostics/Maintenance Actions

Over Time, and Faults (Despite Prognostics) Over Time.

The use of effective prognostics developed condition based

maintenance can reduce the likelihood of critical failures.

As seen in Figure 6, the critical failure events on this

analysis begin with loss of operation (1–yellow) and loss of

equipment (2-Orange), beginning with low probability in

the systems operational life cycle. Then, as the system ages,

the probability of Loss of operation increases rapidly,

followed by loss of equipment. Finally, the loss of life

severity (3-red) begins to increase further into the

operational life cycle. This is where the assessment of

prognostics needs to be performed for those failure modes

contributing to these critical failures.

Figure 6. Simulation results showing likelihood of Critical

failures over time

Where the simulation shown in Figure 6 shows total failures

progressively over time by severity, Figure 7 shows the

number of critical failures over time, and also shows those

failures detected by prognoses (4, magenta). These

prognosed failures are calculated as being repaired prior to

critical failure. Since it well known that prognostics is not

100% correct, other critical failures did occur. These

failures are shown by number of failures at a specific point

in time. These failures are also identified by severity (1,

yellow, loss of operation; 2, orange, loss of equipment; 3,

red, loss of life). With the ability to observe types of failures

over time, it is now possible to re-analyze the diagnostics,

and possibly improve the prognostics effectiveness.

Figure 7. Number of critical failures prognosed over time


86


8

Figure 8. System aborts contributed to inadequate PHM

Figure 8 shows the simulation results for system aborts over

Time. This calculation is based on the accuracy of

prognostic tests defined in the diagnostics analysis. The

“true” system aborts are projected over time as shown at the

bottom of the graph (2-orange).

The system aborts contributed to false prognostics are

shown in the top of the graph (1-red). This is a design

condition that can be corrected by improving the prognostic

tests and therefore the accuracy of these tests. Once the

prognostics have been assessed for improvement, the

diagnostics analysis would be adjusted based on new test

parameters. The simulation would then be re-run to validate

the results for improvement in system aborts. During this

diagnostics update, the “true” system aborts can be assessed.

Even though the number of true aborts is low, there may be

opportunities for improvement.

Figure 9 shows the number of critical failures over time by

failure entity (failed item) and by failure severity. Note that

the diagnostic analysis performed on the sample automotive

braking system is used for demonstration only and does not

necessarily represent actual operational parameters for this

system. This statement is made to keep people from arguing

about the actual diagnostic values rather than paying

attention to the message being presented!

The failures shown are the same as those contributing to the

simulation results in the other charts, except these are

identified by specific parts. Those item failures prognosed

are identified in magenta (2). The groups of four are the

brake pads (four right and four left, front and rear). The

prognostic test for these is quite basic. Each pad contains a

low pad thickness metallic “scraper” or “squealer”. When

the brakes squeal, it is time for inspection. The added

failures shown for the brake pads (1, orange, loss of

equipment), are based on actual pad failure to where they

are scraping the disk rotor (very expensive repair). These

“running to failure” events can be minimized through better

prognostics.

Figure 9. Number of critical failures by item and severity

In fact, existing, more sophisticated, brake pad wear

detection does exist in the form of optical sensing.

There is a loss of life (3, red) failure that involves the

Antilock Brake System hydraulic pump. There is a possible

loss of braking control if this pump fails. It does have a low

probability of failure, but this would be a candidate for

additional prognostics.

The lager loss of operation, or degraded operation (4,

yellow), failures noted are for air in the hydraulic lines.

There are two items shown with higher failure rate for loss

of equipment (orange). These are the rear brake lights. The

burnt out bulb failure mode would be difficult to prognose

but some automobiles do have detection sensors that

provide a warning on the dashboard. The use of LEDs

significantly reduces the failure rate for these items. But,

again, this shows the value of running a simulation of

diagnostics results to provide a graphical representation of

diagnostics, prognostics and maintenance actions over a

specified operational time. The simulation results are not

limited to charts. Each calculation result has a detailed

report defining events and values.

Figure 10 shows calculation results for frequent failures that

are prognosed but without a maintenance plan. These

prognosed items are repaired without opportunistic

maintenance or an effective level of repair definition. Since

actual physics of failure prognostics typically looks at

molecular level single failure modes, the analysis considers

only single failure modes with no repair concept. Reliable

items that were not repaired as a balanced maintenance

action will begin to fail as the operating system matures,

resulting in increased prognostic related failures and low

Mean Time Between Maintenance Action (2, green) and

Mean Time Between Prognostic events (1, magenta)


87


9

Figure 10. Mean time between a prognostics maintenance

action over time

If this were calculated for Mission Success and Availability,

It would show a direct correlation to reduced performance

from the lack of maintenance understanding in a prognostics

analysis. This is mitigated through the integration of

prognostics and diagnostics in an effective ISDD process.

Figure 11 shows the calculation results of failure modes that

are prognosed but the failure was not detected prior to

failure. The loss of equipment failure severity (2, orange) is

shown for those failure modes that need to be reassessed for

possible prognostics improvement. The grey areas indicate

no failure effect (1).

Figure 11. Faults over time by severity despite prognostics

6.0. CONCLUSIONS

There are currently no real guidelines for the calculation of

diagnostic-related metrics for systems whose critical failures

are covered by prognostics. More important is the lack of

prognostics selection based on intelligent diagnostics

analysis. Not only have approaches not yet been

standardized, but many of the alternatives may not have

even been discussed in the public arena. Existing standards

describing diagnostic analysis, such as the IEEE Testability

standard (IEEE Std. 1552, 2004), do not yet account for

prognostics in any way. As a result, diagnostic engineering

analysis and simulation tools have been enhanced to address

this issue.

As more systems are planned for embedded prognostics,

questions about the relationship between prognostics and

diagnostics, and even beyond into sustainment, are likely to

become even more prominent. A common practice will

begin to emerge with subsequent efforts at standardization.

It is important that the relationship between prognostic and

diagnostic analysis be worked as an integrated solution.

Based on subjective, experience driven research, previous

methods for assessing diagnostic-related prognostics

behavior remain in question, and suppliers, customers and

the companies that supply their tools also remain in

question.

The main point of all of this is to break out of the “Sokal

Hoax” syndrome and work the technologies with the goal of

a balanced Health Management and Sustainment solution.

The end result will be significantly lower development,

operation, and support costs, while experiencing higher

Mission Success and Operational Availability!

ACKNOWLEDGEMENT

Eric Gould, Senior Scientist, DSI International, needs to be

recognized for his development of the integration of

Diagnostics and Prognostics in the model Based Diagnostics

process.

REFERENCES

IEEE Trial-Use Standard for Testability and Diagnosability

Characteristics and Metrics, IEEE Std 1522-2004.

Gill, Luke, 2003, F-35 Joint Strike Fighter Autonomic

Logistics Supply Chain

Gould, E., 2011, Diagnostics “After” Prognostics

Saxena, A., Celaya, J., Saha, S., and Goebel, 2010, K.,

Metrics for Offline Evaluation of Prognostic Performance,

International Journal of Prognostics and Health

Management, ISSN 2153-2648, 1010 001.

Sokal, Alan D., 1994, 1995, Transgressing the Boundries:

Towards a Transformative Hermeneutics of Quantum

Gravity


88


10

BIOGRAPHY

James R. Lauffer (Jim) began the technology journey in

Ohio, USA, on New Year’s Day, 1941. Jim received his

Ham Radio license at the age of 12 and designed many

antennas and radio systems. He entered the Air Force at 17

and worked in the Strategic Air

Command as a maintainer of B47s and

anything else that landed on the base.

He entered industry in 1962 and spent

the next 40 years in Logistics,

Reliability, Maintainability and finally

Systems Engineering. This career

began with North American Aviation

in 1962, then Rockwell and finally Boeing. Much of the

engineering time was in combat system development on

international programs, field operations testing, and then in

management trying to get all of these technologies to work

together. Much of the field testing work involved passive

sonar sea trails in the Atlantic and the Mediterranean. He

also worked aircraft avionics upgrades for the Royal

Australian Air Force. Jim retired from Boeing in 2001 and

agreed to help out a small engineering business called DSI.

This was 11 years ago and he still has not figured out how

to retire. But, this past eleven years have resulted in a wealth

of knowledge related to the diagnostics technologies and the

resulting health management and sustainment systems.

Jim’s formal education is a bit thin compared to others in

this field. He started in the world of Applied Physics but

due to work and family, ended up with a business degree. I

guess you would say he has several PhD degrees in

experience. Jim is a past member of the Society of Logistics

Engineers, Society of Reliability Engineers, and the

Association of Old Crows (Electronic Warfare), IEEE, and

presently The American Institute of Aeronautics and

Astronautics (AIAA). To this day Jim continues to study

and attend courses in the sciences. He is still active in Ham

Radio with his Extra Class license and is always looking for

new and balanced PHM solutions.


89

Fatigue Crack Growth Prognostics by Particle Filtering and Ensemble Neural Networks

Piero Baraldi1, Michele Compare1, Sergio Sauco1 and Enrico Zio1, 2

1Politecnico di Milano, Milano, Italy

[email protected] [email protected]

[email protected]

2Chair on Systems Science and the Energetic Challenge, European Foundation for New Energy-Electricité de France, Ecole Centrale Paris and Supelec, France


ABSTRACT

Particle Filtering (PF) is a model-driven approach widely used in prognostics, which requires models of both the degradation process and the measurement acquisition system. In many practical cases, analytical models are not available, but a dataset containing a number of pairs component state - corresponding measurement may be available.

In this work, a data-driven approach based on a bagged ensemble of Artificial Neural Networks (ANNs) is adopted to build an empirical measurement model of a Particle Filter for the prediction of the Residual Useful Life (RUL) of a structure whose degradation process is described by a stochastic fatigue crack growth model of literature. The work focuses on the investigation of the capability of the proposed approach to cope with the uncertainty affecting the RUL prediction.

1. INTRODUCTION

The prediction of the Remaining Useful Life (RUL) of a degrading equipment is affected by several sources of uncertainty such as the randomness in the future degradation of the equipment, the inaccuracy of the prognostic model used to perform the prediction and the noise in the sensor data used by the prognostic model to obtain the RUL prediction. Thus, any RUL prediction provided by a prognostic model should be accompanied by an estimate of its uncertainty (Tang et al. 2009; Liu et al. 2011) in order to confidently plan maintenance actions, taking into account

the degree of mismatch between the RUL predicted by the prognostic model and the real RUL of the equipment (Coble 2010; Zio 2012). In this respect, a method able to estimate a probability density function of the degrading equipment RUL is PF, which is a model-based approach successfully used in prognostics applications (e.g., Vachtsevanos et al. 2006, Orchard et al. 2005, Orchard & Vachtsevanos 2009, Cadini et al. 2009). PF is a Bayesian tool for non-linear state estimation, which requires (e.g., Gustaffson & Saha 2010, Doucet et al. 2001, Arulampalam et al. 2002):

1) The knowledge of the degradation model describing the stochastic evolution in time of the equipment degradation x (in general a multi-

dimensional vector):

( 1) = ( ( ), ( ))x t g x t tω+ (1)

where g is a possibly non-linear vector function

and ( )tω is a possibly non-Gaussian noise.

2) A set of measures (1),..., ( )z z t of past and present

values of some physical quantities z related to the

equipment degradation x . Although z in general is

a multi-dimensional vector, in this work it is considered as a mono-dimensional variable; then, the underline notation is omitted.

3) A probabilistic measurement model which links the measure z with the equipment degradation x :

( ) = ( ( ), ( ( )))z t h x t x tν (2)

where h is a possibly non-linear vector function and ( )xν is the measurement noise vector.

_____________________ Baraldi et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


90


2

In practical cases, the measurement model h may not be available in analytical form but a dataset

= ( , ), = 1,..., nn trainingT x z n N containing a number trainingN

of pairs of state nx and corresponding measurement nz

may be available. This is the case, for example, of the piping of deep water offshore well drilling plants, which degrades due to a process of scale deposition. This may cause a decrease, or even a plug, of the cross sections of the tubular. Giving the inaccessibility of the piping, it is usually impossible to acquire a direct, on line, measure of the scale deposition thickness. On the other side, research efforts are devoted to perform laboratory tests to investigate the relationships between the scale deposition thickness and other parameters which can be more easily measured during plant operation, such as pressures, temperatures and brine concentrations. By this way, one can populate a dataset with the values of the measurable parameters for different scale deposition thicknesses, and use the data to build data-driven models for predicting the scale deposition thickness (Moura et al. 2011).

In this work we have developed an ensemble of ANNs (e.g., Baraldi et al. 2012) as model of the measurement equation in a PF scheme. The proposed prognostic approach is applied to a literature case study (Orchard & Vachtsevanos 2009) concerning crack propagation. The obtained results are compared to those which would be obtained by direct using the measurement equation in the PF model, considering the accuracy of the RUL prediction and the capability of the method of providing an estimate of its uncertainty.

2. PARTICLE FILTERING

In PF, a set of sN weighted particles, which evolve

independently on each other according to the probabilistic degradation model of Eq. 1, is considered. The basic idea is that such set of weighted random samples constitutes a discrete approximation of the true probability density function (pdf) of the system state x at time t . When a new measurement is collected, it is used to adjust the predicted pdf through the modification of the weights of the particles in a Bayesian perspective. This requires the knowledge of the probabilistic law which links the state of the component to the gathered measure (Eq. 1). From this model, the probability distribution ( | )P z x of observing the sensors

output z given the true degradation state x is derived

(measurement distribution). This distribution is then used to update the weights of the particles upon a new measurement collection. Roughly speaking, the smaller the probability of encountering the acquired measurement value, when the actual component state is that of the particle, the larger the reduction of the particle's weight. On the contrary, a good match between the acquired measure and the particle state results in an increase of the particle importance (for further

details, see Arulampalam et al. 2002 and Doucet et al. 2001).

3. BAGGED ENSEMBLE OF ANNS FOR BUILDING THE MEASUREMENT MODEL

A method to estimate the pdf ( | )P z x of the measurement

z in correspondence of a give equipment degradation state x is proposed in this Section. It is derived from Carney et

al. (1999) and Nix &Weigend (1994), and requires the availability of a dataset made of trainingN couples ( , )n nx z .

The underlying hypothesis of this approach is that the measurement model, which is unknown, can be written in the form:

( ) ( ) ( )z x f x xν= + (3)

where ( )f x is a biunivocal mathematical function and the

measurement noise ( )xν is a zero mean Gaussian noise.

The method of Carney et al. (1999) is based on the use of a bagged ensemble of ANNs, which are employed to build an interpolator ( )xϕ of the available training patterns

( , ), 1.,...,n n trainingT x z n N= = .

The key idea of bagging (Breiman 1999) is to treat the available dataset T as if it were the entire population, and then create alternative versions of the training set, by randomly sampling from it with replacement. This allows providing more stable estimations. In details, a number B of alternative versions *

=1 Bb bT of T are created by randomly

sampling from it with replacement. Using these training sets, the networks *

=1 ( ; ) Bb b bx Tϕ are built and the output

( )avg xϕ of the bagged ensemble in correspondence of the

generic test state x is obtained by averaging the single

ANN output according to:

*

=1

1( ) = ( ; )

B

avg bb

x x TB

ϕ ϕ∑ (4)

On the other hand, since PF requires the knowledge of the pdf ( | )P z x , the estimate of ( )f x does not suffice to apply

PF. In this respect, the procedure proposed in Carney et al. (1999) allows to estimate the pdf ( | ( ))P z f x from which

the pdf ( | )P z x can be obtained, being the function f

invertible for hypothesis. The procedure is based on the subtraction of the random quantity ( )avg xϕ to both sides of

Eq. 3:

( ) ( ) = [ ( ) ( )] ( )avg avgz x x f x x xϕ ϕ ν− − + (5)


91


3

The left-hand side of Eq. 5 is a random variable which represents the error of the ensemble output ( )avg xϕ with

respect to the measurement ( )z x .

This random error is made up of two contributions (right hand side of Eq. 5):

1. The random difference ( ) ( )avgf x xϕ− between the

unknown deterministic quantity ( )f x and the

ensemble output ( )avg xϕ . This quantity is a

random variable distributed according to ( ( ) | ( ))avgP f x xϕ , being ( )avg xϕ dependent on the

random training set ,bT b=1,…, B; i.e., different

training sets would lead to different ensemble models and thus to different output ( )avg xϕ . Since

( ) ( )avgf x xϕ− can be seen as the model ( )avg xϕ

error, its variance will be referred to as model error variance and indicated by 2 ( )m xσ .

2. The intrinsic noise ( )xν of the measurement

process, whose variance is indicated by 2 ( )xα .

These two contributions are estimated by means of the procedures described in the two following Sections.

3.1. Distribution of the model error variance

The procedure here used to estimate the distribution ( ( ) | ( ))avgP x f xϕ of the ensemble output ( )avg xϕ given the

true value of ( )f x (i.e., the ‘inverse’ of ( ( ) | ( ))avgP f x xϕ ),

is based on the assumption that the random variable ( ) ( )avgf x xϕ− is Gaussian with zero mean and standard

deviation ( )m xσ , which entails that ( ( ) | ( ))avgP x f xϕ is

Gaussian with mean ( )f x , and that all we need to know is

( )m xσ . Notice that residual errors in the output of the ANN

are usually not caused by variance alone; rather, there may be biases in the output of the ANN, which invalidate the assumption that the mean of the distribution is zero. However, it is generally accepted that the contribution of the variance in the residual error of the ANN dominates that of the bias (see Stuart et al 1992 for further details on this). Furthermore, the bias in the output of an ensemble of NNs is expected to be smaller than that of the single ANN.

In order to estimate the model error variance 2 ( )m xσ , the

technique in Carney et al. (1999) requires to divide the B networks of the ensemble ( )avg xϕ into M smaller sub-

ensembles, each one containing K networks, and to consider the output ( ),m

com xϕ m=1,.., M of each sub-

ensemble as:

=1

1( ) = ( )

Kmcom k

k

x xK

ϕ ϕ∑ (6)

The set =1= ( )m Mcom mxζ ϕ constitutes a sampling of M

values from the distribution ( ( ) | ( ))com avgP x xϕ ϕ and its

sample variance 2ˆ ( )m xσ could be used to approximate the

unknown variance 2( )m xσ of the ensemble output.

Notice that the idea behind this procedure is that by estimating ( )f x with ( )avg xϕ , one can approximate

( ( ) | ( ))avgP x f xϕ by ( ( ) | ( )).com avgP x xϕ ϕ In order to

improve the reliability and stability of 2ˆ ( )m xσ , bagging is

also performed on the values of ζ . Thus, P bagging re-

sampled sets of ζ are gathered:

*=1= P

p pζΓ (7)

where *pζ is the p -th subset containing M values of

( )com xϕ , sampled with replacement from ζ . For any subset *pζ , = 1,...,p P , the corresponding variance 2* ( )p xσ is

computed; then, the estimate 2ˆ ( )m xσ of the variance 2 ( )m xσ

is calculated as their average value:

2 2*

=1

1ˆ ( ) = ( )

P

m pp

x xP

σ σ∑ (8)

Finally, the estimate of the regression distribution ( ( ) | ( ))avgP x f xϕ proposed by the method is:

2ˆ( ( ) | ( )) ( ( ), ( ))avg avg mP x f x N x xϕ ϕ σ≈ (9)

3.2. Distribution of the measurement noise

In this Section, the technique proposed in Nix & Weigend,

(1994) is applied to estimate the variance 2( )xα of the

Gaussian zero mean noise ( )xν affecting the measurement

equation (Eq. 3).

From Eq. 5, one can derive:

2 2

[ ( )]

[ ( ) ( )] [ ( )] 2 [ ( ) ( )] ( )

( ) ( )

avg

avg avg

m

Var z x

Var f x x Var x E f x x x

x x

ϕϕ ν ϕ ν

σ α

− =

− + + − =

+(10)

The last equality is due to the independence of the error [ ( ) ( )]avgf x xϕ− from the measurement noise ( )xν . To

explain this, notice that [ ( ) ( )]avgf x xϕ− depends on the

noise values nν affecting the measures ( ) ,n n nz f x ν= +

1.,..., ,trainingn N= in the training data

( , ), 1.,...,n n trainingT x z n N= = , which are used to build the


92


4

ensemble model ( )mcom xϕ , whereas ( )xν is the value of the

noise affecting the measure of the test data x , not used for

training the model. Thus, nν 1.,..., ,trainingn N= and the

values sampled from ( )xν in the test data are different,

independent realizations of the same random variable.

Notice also that 2 ( )xν obeys a Chi-square 2 ( )xχ

distribution with 1 degree of freedom.

The term 2 ( )m xσ can be estimated according to the

procedure illustrated in the previous Section 3.1 whereas, being ( ) ( )avgz x xϕ− a zero mean random variable, its

variance is given by:

2[ ( ) ( )] = [( ( ) ( )) ]avg avgVar z x x E z x xϕ ϕ− − (11)

Thus, in correspondence of the training couples ( , ),n nx z

1,..., ,trainingn N= one can approximate ( )2( ) ( )avgE z x xϕ −

by ( )2( ) ( )avgz x xϕ− and obtain, according to Eq. 10, a

dataset formed by the pairs 2ˆ( , )n nx α , 1,..., trainingn N= ,

where:

2 2 2ˆ ˆ= ( ( )) ( ),0n n avg n nmax z x xα ϕ σ− − (12)

Finally, in order to estimate 2 ( )xα for a generic x , a single

ANN is trained using the dataset 2ˆ( , )n nx α , 1,..., .trainingn N=

3.3. Estimate of the measurement distribution P(z|x)

Being ( )avg xϕ an estimate of ( )f x , the measurement

distribution ( | ( ))P z f x can be approximated by the

distribution ( | ( ))avgP z xϕ which can be derived from the

distribution ( ( ) | ( ))avgP x f xϕ and the distribution of the

measurement noise ( )xν , according to Eq. 5. Since these

two distributions are both Gaussian, with means and variances estimated as shown in Sections 3.1 and 3.2,

( | ( ))P z f x is approximated by a Gaussian distribution with

mean ( )avg xϕ and variance 2 2ˆˆ ( ) ( )m x xσ α+ . Finally, being

( )f x invertible, the distribution of the measurement z in

correspondence of a given state x , ( | )P z x is given by:

2 2ˆˆ( | ) ( | ( )) ( ( ), ( ) ( ))avg mP z x P z f x N x x xϕ σ α≈ ≈ + (13)

4. CASE STUDY

In this Section, the technique previously described for estimating the measurement distribution ( | )P z x is applied

to a case study derived from Orchard & Vatchsevanos (2009), which deals with the crack propagation phenomenon

in a component subject to fatigue load. The system state is described by the vector 1 2( ) = ( ( ), ( ))x t x t x t , whose first

element, 1( ),x t indicates the crack depth whereas the second

element, 2( ),x t represents a time-varying model parameter

that directly affects the crack growth rate. The evolution of this degradation process is described by the following two equations, which form a Markovian system of order one:

4 31 1 2 1( 1) = ( ) 3 10 (0.05 0.1 ( )) ( )x t x t x t tω−+ + ⋅ + ⋅ + (14)

2 2 2( 1) = ( ) ( )x t x t tω+ + (15)

where 1( )tω is a Gaussian noise with mean 0.045 and

standard deviation 0.116, and 2( )tω is a zero mean

Gaussian noise with standard deviation 0.010.

In the present case study, the measurement equation is assumed to be unknown whereas a dataset formed by the

trainingN pairs 1,( , )n nx z 1,..., trainingn N= , is available, where

the subscript 1 refers to the first component of vector x(t).

In practice, given the purpose of the present work of showing the feasibility of the proposed approach, the dataset

1,( , ), 1,...,n n trainingT x z n N= = has actually been artificially

obtained by simulating the behavior of the degradation process ( ),x t and sampling from the probabilistic

measurement model (Orchard & Vachtsevanos 2009):

1 1 1 1( ) = ( ) ( ) ( ) 0.25 ( )z t f x x x t xν ν+ = + + (16)

where 1( )xν is a zero mean Gaussian noise, whose standard

deviation depends on 1x :

21 1 1

1 1 1[ ( )] =

120 10 2Std x x xν − + + (17)

According to Eq. 16, the function 1( ) ( )f x f x= is given by

x1+0.25, which is, as required by the method, an invertible function.

To conclude this Section, notice that the probabilistic measurement model in Eq.(9) has been intentionally taken simple, being the main interest of this work the quantification of the uncertainty in the RUL prediction and not the ensemble ability in reproducing the measurement equation. In this respect, the knowledge of the variance of the measurement noise is fundamental, as it determines the amplitude of the prediction intervals of the RUL estimates. Thus, the capability of correctly reconstructing the variance behavior plays a key role in the assessment of the potential of the proposed technique.

4.1. Estimate of the measurement distribution

According to the technique illustrated in Section 3, an ensemble of = 200B ANNs has been built using the


93


5

available dataset 1,( , ), 1,..., ,n n trainingT x z n N= = where

1000trainingN = . Every ANN has 5 tan-sigmoidal hidden

neurons and one linear output neuron. To estimate 2 2

1( ) ( )m mx xσ σ= , the ensemble has been divided into

= 20M sub-ensembles and =1000P bagging resamples of the sub-ensemble outputs 1( ) ( ),m m

com comx xϕ ϕ=

1,..., ,m M= have been considered.

The results are evaluated in terms of the following performance indicators, which are computed by considering a set of = 1000testN pairs 1,( , )i ix z , 1,..., testi N= which

have been obtained from Eq. 16 and 17:

1. The square bias 2b ; i.e., the average quadratic difference between the true value of 1( )f x and the

ensemble estimate of this quantity 1( )avg xϕ :

2 21, 1,

=1

1= ( ( ) ( ))

Ntest

i avg iitest

b f x xN

ϕ−∑ (18)

This value gives information on the accuracy of the estimate of 1( ) ( )f x f x= provided by the

ensemble. Notice that the computation of this indicator requires the knowledge of the function

1( )f x , which is not available if the measurement

equation (Eq. 16) is not known. Thus, in general one can only compute:

21,

1

1( ( ) )

testN

i iitest

MSE x zN

ϕ=

= −∑ (19)

Small values of MSE indicate satisfactory performance of the ensemble.

2. The coverage of the Prediction Interval (PI) with confidence 0.68. This indicator is used to verify the accuracy of the estimate of the distribution

1( | ) ( | ).P z x P z x= A PI with a confidence level

pγ is defined as a random interval in which the

observation 1( ) ( )z x z x= will fall with probability

pγ (Carney et al. 1999, Heskes 1997):

1 1( ( ) ( ))p pP z x PI xγ γ∈ = (20)

Being the estimate of 1( | )P z x a Gaussian distribution with

mean 1( )avg xϕ and variance 2 21 1

ˆˆ ( ) ( )m x xσ α+ , the PI with

0.68pγ = is given by:

2 21 1 1 1

2 21 1 1 1

ˆˆ( ) ( ) ( ) ( )

ˆˆ( ) ( ) ( ) ( )

avg m

avg m

x x x z x

z x x x x

ϕ σ α

ϕ σ α

− + ≤

≤ + + (21)

In order to verify whether the estimate of 1( | )P z x provides

a satisfactory approximation of the true pdf, we will consider how many times the measurement iz falls within

the 0.68 1,( )p iPI xγ = . The closer to pγ the portion of points

hitting the pγ -confidence interval, the more accurate the

estimation of the parameters of the Gaussian pdf.

In practice, for every 1,ix , 1,..., testi N= , a counter iC is set

to 1 or 0 depending on whether the iz belongs or not to the

estimated 0.68 1,( )p iPI xγ = . The closer the average of ,iC

1,..., testi N= to 0.68, the better the approximation.

Cross-validation of the results has been done by repeating the computations with 25setN = different, randomly

generated training and test sets. This avoids over/under

estimations of the performance indicators 2b and coverage.

Table 1 reports means and standard deviations (std) of the performance indicators over the 25 cross-validations.

model Ensemble 1 ANN 2b 0.0040 ± 0.0015 0.0097 ± 0.0060

PI coverage 0.6758 ± 0.0366 - Table 1: Performance indicators over 25 cross-validations; the mean ± std is reported

Notice that the ensemble output 1( )avg xϕ is very accurate in

the prediction of the function 1( )f x , the bias being very

small. Furthermore, notice that the ensemble outperforms a single ANN trained with all the 1000 training patterns. With respect to the estimate of the distribution 1( | )P z x , the

proposed method provides a satisfactory approximation, being the coverage very close to 0.68.

Table 2 reports the estimates of the two contributions 2mσ

and 2α of the variance of the estimated measurement

distribution 1ˆ ( | )P z x . Notice that in this case study, 2mσ is

negligible with respect to the variance 2α of the measurement noise; this entails that the accuracy of the estimate of the PI is more sensible to the estimate of 2α .

In this respect, Figure 1 shows the estimate of 21( )xα and

compares it to the true 2α value provided by Eq. 17. Notice that this comparison, which is done in this work to assess the performance of the methodology, is not possible in real industrial applications if the measurement model (Eqs. 16 and 17) is not available.

Estimation Real

value


94


6

2mσ +

2α 0.4489±0.1359 -

2mσ 0.0243±0.0317 -

2α 0.4886±0.0276 0.4900

Table 2: Contributions to the 1( | )P z x variance

Figure 1: True and approximated measurement noise variance 2

1( )xα

4.2. Crack depth prediction

The objective of this Section is to evaluate the performance of the overall scheme in the prediction of the crack depth evolution when the ensemble of ANNs is used to estimate the measurement distribution( | )P z x . To this purpose, the

problem tackled consists in predicting at 80t = (in arbitrary units) the future crack propagation, on the basis of eight measurements of the crack depth taken at time 10mt m= ⋅ ,

m=1,…, 8. This prediction phase is performed by considering the evolution of the particles according to the model in Eqs. 14 and 15 (e.g., see Orchard & Vachtsevanos 2009). In particular, we focus on the time instant t=80, when the PF updates via ( | )P z x the particles' weights after the

last measurement (z=4.6087 in arbitrary units) has been acquired.

Figure 2 shows the prediction of the crack depth evolution performed at t=80, after the acquisition of the last measurement, by using the ensemble model to estimate

( | )P z x . This prediction has been compared to that which

would be obtained by directly using the measurement equation in the PF.

Notice that the linearity of the prediction of the expected value of x1 can be explained by averaging Eqs. 14 and 15:

2 2 2 2[ ( 1) = [ ( )] [ ( )] [ ( )] constantE x t E x t E t E x tω+ + = =

4 31 1 2 1

1 1

[ ( 1)] [ ( )] 3 10 (0.05 0.1 [ ( )]) [ ( )]

[ ( 1)] [ ( )] constant

E x t E x t E x t E t

E x t E x t

ω−+ − = ⋅ + ⋅ ++ − =

Figure 2: Comparison of the predictions with the true state evolution

To evaluate the impact of replacing the measurement equation with the ensemble of ANNs, 100runN = different

degradation trajectories have been simulated and the predictions of the crack depth have been performed.

Also in this case, the prediction provided by the ensemble of ANNs trained with 1000trainingN = patterns has been

compared to that based on the analytical measurement equation ( | )P z x . Each run is characterized by the same

true trajectory, the same acquired measures and the same state noise vector. The following performance indicators have been computed:

1. The coverage of the PI, with confidence 0.68. In particular, the prediction of the crack depth at

120t = has been considered. At each run, the boundaries of the PI are computed by considering the 16th and 84th percentiles of the estimate of the pdf of the crack depth. A counter is set to 1 or 0 if the true trajectory belongs or not to the corresponding interval, in analogy with the coverage verification explained in Section 4.2.

2. The average width over the 100runN = runs of the

PI at 120t = .

3. The Mean Square Error (MSE) over the 100runN = runs between the prediction of the

crack depth provided by using the PF and its true value at 120t = . That is:

( )2

1201

1 run

run run

run

N

n nnrun

MSE X oN =

= −∑ (22)

80 85 90 95 100 105 110 115 1204

5

6

7

8

9

10

time

crac

k de

pth

[inch

]

1000 trainingtraditionaltrue


95


7

where runnX is the true crack depth in the test

trajectory at 120t = and runno is the expected value

of the crack depth pdf estimated by the PF.

The obtained values are reported in Table 4. It can be noticed that the coverage of the ensemble is very close to 0.68; furthermore, even the other performance indicators are very close to those which would be obtained by considering the measurement equation. This result confirms that the approximation of the distribution ( | )P z x is accurate and therefore it does not remarkably alter the outcome of the PF.

Traditional Data-driven

coverage 0.6500 0.7000

PI width 1.3058 1.3226

MSE 0.3421 0.3464 Table 4: Performance indicators at t=120

Finally, the performance evaluator s proposed by Saxena et al. 2008 has been computed to evaluate the prediction performance:

1

2

1

1

1 0

1

i

i

dn

ai

i

dn

a

i

e if d

s

e otherwise

−

=

−

=

− <=

−

∑

∑

where a1= 10, a2=13, n=100 is the number of simulated histories and d is the difference between the estimated RUL and its true value. To compute the value of this performance metric, the following procedure has been adopted:

1. Set the failure threshold to ST=7. 2. Simulate the evolution of the degradation process;

this allows calculating the true value of the true RUL tRUL at t=80 as the difference between the time instant at which the component achieves ST and 80. Moreover, the set of measures sampled according to the measurement model are collected.

3. Use the PF to estimate the component degradation

state at t=80 and predict the RUL RULt .

4. Calculate the difference RUL RULd t t= − .

5. Perform n-1 times the steps 2-4 and compute s.

The values of the metric s obtained in the case in which the RUL is predicted by using the ‘traditional’ PF approach (s=10.30) and the ‘data–driven’ approach (s=10.65) are very close to each other.

5. CONCLUSIONS

PF is often proposed as prognostic technique for estimating the evolution of the degradation state x of a system;

generally, it resorts to analytical models of both degradation state evolution and measurement. In practice, the measurement model may not be available in an analytical form; rather, there may be available a set of data which allows, through data-mining techniques, building the measurement model. In this work, a technique based on an ensemble of ANNs has been investigated to this aim and applied to a case study derived from the literature. The verification conducted on the results shows that a good approximation of the model may be obtained and its substitution in the PF does not significantly affect its performance. Furthermore, the proposed method has been shown capable of estimating the uncertainty on the RUL prediction.

Additional effort will be dedicated in future works to improve the accuracy of the estimate when only a small training set is available and to extend the applicability of the technique also in those cases in which the measurement equation ( )f x is not biunivocal or has a more complex

form. Furthermore, another future objective is the substitution also of the model of the evolution of the system state with a data-driven model, e.g., an ensemble of trained ANNs, in order to allow the usage of PF in those cases where also an analytical model of the evolution of the system is unavailable.

REFERENCES

Arulampalam, M.S., Maskell, S., Gordon, N. and Clapp, T. (2002). A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking, IEEE Transactions on Signal Processing, 50 (2), 174-188.

Baraldi, P., Di Maio, F. Zio, E., Sauco, S. Droguett, E., Magno, C. (2012) Ensemble of Neural Networks for Predicting Scale Deposition in Oil Well Plants Equipments, proceedings of PSAM 11 & ESREL 2012.

Breiman, L. (1999) Combining predictors, in Sharkey AJC (Ed.) Combining artificial neural nets: ensemble and modular multinet systems. Springer, Berlin Heidelberg New York, pp 31-50.

Cadini, F., Zio, E., Avram, D. (2009) Model-based Monte Carlo state estimation for condition-based component replacement, Reliability Engineering & System Safety, Vol. 94 (3), pp. 752-758.

Carney, J., Cunningham, P., & Bhagwan, U. (1999). Confidence and prediction intervals for neural network ensembles. International Joint Conference on Neural Networks IJCNN, July 10-16, Washington D.C.

Coble, J.B. (2010) Merging Data Sources to Predict Remaining Useful Life – An Automated Method to Identify Prognostic Parameters. PhD diss., University of Tennessee.


96


8

Doucet, A., de Freitas, J.F.G. and Gordon, N.J. (2001) Sequential Monte Carlo methods in practice. Springer-Verlag, New York.

Gustaffson, F., & Saha, S. (2010). Particle filtering with dependent noise. In Proceedings of the 13th Conference on Information Fusion (FUSION). Edinburgh.

Heskes, T. (1997) Practical Confidence and Prediction Intervals, in M. Mozer, M. Jordan and T. Peskes, editors, Advances in Neural Information Processing Systems,vol 9, pages 466-472, Cambridge, 1997, MIT Press.

Hsu, C.W. , Chang, C.C., Lin, C.J. (2003) A Practical Guide to Support Vector Classification. Technical Report, 2003.

Liu, R., Ma, L., Kang, R. and Wang, N. (2011) The Modeling Method on Failure Prognostics Uncertainties in Maintenance Policy Decision Process, Proc. 9th Int. Conf. on Reliability, Maintainability and Safety (ICRMS 2011), pp. 815-820.

Moura, M.C., Lins, I.D., Ferreira, R.J., Droguett, E.L., Jacinto, C.M.C. Predictive maintenance policy for oil well equipment in case of scaling through support vector machines, in Proceedings of the European Safety and Reliability Conference-ESREL 2011, pp. 503-507.

Nix, D. and Weigend, A. (1994). Estimating the mean and the variance of the target probability distribution," in IEEE world congress on computational intelligence, International Joint Conference on Neural Networks, June 27-July 2, Orlando, Florida.Vol 1, pp. 55-60.

Orchard, M., Wu, B., Vachtsevanos, G. (2005) A Particle Filter Framework for Failure Prognosis, Proceedings of WTC2005 World Tribology Congress III. Washington D.C., USA, Sept. 12-16, 2005

Orchard, M. and Vachtsevanos, G. (2009) A Particle Filtering Approach for On-Line Fault Diagnosis and Failure Prognosis, Transactions of the Institute of Measurement and Control, Vol. 31 (3-4), pp. 221-246.

Papoulis, A. and Pillai, S.U. (2002) Probability, Random Variables and Stochastic Processes. McGraw-Hill Higher Education, 4th edition.

Saxena, A., Goebel K, Simon, D., Eklund, N. (2008) Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation, in International Conference on Prognostics and Heath Management, PHM2008.

Stuart, G., Bienenstock, E. and Doursat, R. (1992) Neural Networks and the bias/variance dilemma. Neural Computation, Vol. 4(1), pp. 1-58.

Tang, L., Kacprzynski, G.J., Goebel, K. and Vachtsevanos, G. (2009) Methodologies for Uncertainty Management in Prognostics. Proc. IEEE Aerospace conference, 2009, pp. 1-12.

Vachtsevanos, G., Lewis, F.L., Roemer, M., Hess, A. and Wu, B. (2006) Intelligent Fault Diagnosis and Prognosis for Engineering Systems. John Wiley & Sons.

Zio. E. (2012) Prognostics and health management of industrial equipment, in Diagnostics and Prognostics of Engineering Systems: Methods and Techniques, S. Kadry, Eds. IGI-Global, 2012.

BIOGRAPHIES

Piero Baraldi (BS in nuclear engng., Politecnico di Milano, 2002; PhD in nuclear engng., Politecnico di Milano, 2006) is assistant professor of Nuclear Engineering at the department of Energy at the Politecnico di Milano. He is the current chairman of the European Safety and Reliability Association, ESRA, Technical Committee on Fault Diagnosis. His main research efforts are currently devoted to the development of methods and techniques for system health monitoring, fault diagnosis, prognosis and maintenance optimization. He is co-author of 42 papers on international journals and 38 on proceedings of international conferences, and serves as referee of 5 international journals.

Michele Compare (BS in mechanical engng., University of Naples Federico II, 2003, PhD in nuclear engng., Politecnico di Milano, 2011) is currently a post-doc at the Politecnico di Milano. He worked as RAMS engineer and risk manager. His main research efforts are devoted to the development of methods and techniques in support of maintenance of complex systems.

Sergio Sauco BS in Energy engng. 2009, Politecnico di Milano; MS in Nuclear engng., Politecnico di Milano, 2011.

Enrico Zio (BS in nuclear engng., Politecnico di Milano, 1991; MSc in mechanical engng., UCLA, 1995; PhD, in nuclear engng., Politecnico di Milano, 1995; PhD, in nuclear engng., MIT, 1998) is Director of the Chair in Complex Systems and the Energetic Challenge of Ecole Centrale Paris and Supelec, full professor, Rector's delegate for the Alumni Association and past-Director of the Graduate School at Politecnico di Milano, adjunct professor at University of Stavanger. He is the Chairman of the European Safety and Reliability Association ESRA, member of the Korean Nuclear society and China Prognostics and Health Management society, and past-Chairman of the Italian Chapter of the IEEE Reliability Society. He is serving as Associate Editor of IEEE Transactions on Reliability and as editorial board member in various international scientific journals. He has functioned as Scientific Chairman of three International Conferences and as Associate General Chairman of two others. His research topics are: analysis of the reliability, safety and security of complex systems under stationary and dynamic conditions, particularly by Monte Carlo simulation methods; development of soft computing techniques for safety, reliability and maintenance applications, system monitoring, fault diagnosis and prognosis. He is author or co-author of five international books and more than 170 papers on international journals.


97

Feature Extraction and Evaluation for Health Assessment andFailure Prognostics

K. Medjaher1, F. Camci2, and N. Zerhouni1

1 FEMTO-ST Institute, AS2M Department, UMR CNRS 6174-UFC/ENSMM/UTBM, 25000 Besancon, [email protected]

2 IVHM Centre School of Applied Sciences Cranfield University, [email protected]

ABSTRACT

Abstract - The estimation of Remaining Useful Life (RUL)of industrial equipments can be realized on their most criticalcomponents. Based on this assumption, the identified criticalcomponent must be monitored to track its health state duringits operation. Then, the acquired data are processed to extractrelevant features, which are used for RUL estimation.This paper presents an evaluation method for the goodnessof the features, extracted from raw monitoring signals, forhealth assessment and prognostics of critical industrial com-ponents. The evaluation method is applied to several simu-lated datasets as well as features obtained from a particularapplication on bearings.

1. INTRODUCTION

The availability, reliability and security of industrial equip-ments can be ensured by monitoring their most critical com-ponents to continuously assess their health condition and pre-dict their future one leading to maintenance, life cycle andcost optimization. Examples of critical physical componentscan be bearings, gears, batteries, belts, etc. Bearings failure isconsidered as the one of the foremost causes of breakdown inrotating machinery (Li et al., 1999). Bearing faults accountfor the 40% of motor faults according to the research con-ducted by Electric Power Research Institute (EPRI) (Enzo &Ngan, 2010). Turbine engine bearing failures are the lead-ing cause of class-A mechanical failures (loss of aircraft)(Richard, 2005). Even one aircraft saved with prognosticswould pay its development cost (Marble & Morton, 2006).The identification of the most convenient time of maintenanceafter failure detection without reducing the safety require-ments is crucial, which is possible with prognostics capabil-ity. Thus, bearing prognostics is very critical for effective

K. Medjaher et.al. This is an open-access article distributed under the termsof the Creative Commons Attribution 3.0 United States License, which per-mits unrestricted use, distribution, and reproduction in any medium, providedthe original author and source are credited.

operation and management.Failure detection forces machinery to shut down that causestremendous time, productivity and capital loss. In addition,it is not uncommon to replace a defected/used bearing witha new one that has shorter remaining useful life than the de-fected one. Each failure type (outer race, inner race, ball andcage defects) causes a distinct signature in the vibration fre-quency (Enzo & Ngan, 2010) and vibration analysis is con-sidered as the most reliable method in bearing failure detec-tion (Zhang, Sconyers, Patrick, & Vachtsevanos, 2010; Da-vaney & Eren, 2004; McFadden & Smith, 1984; Tandon &Choudhury, 1999). However, it is often difficult to extractthe failure signature due to the noise in the data especially inearly stages of the failure (Su, Wang, Zhu, Zhang, & Guo,2010; Bozchalooi & Liang, 2008; He, Jiang, & Feng, 2009).These features are then used to do failure detection, diagnos-tic and prognostic.Feature extraction is the common step in all types of prognos-tic approaches and one of the most critical steps in diagnosticsand prognostics. The extracted features are first evaluated andthen used by appropriate methods and algorithms to detect thefaults and to predict the equipment’s remaining useful life. Inthis framework, the goodness of the features affects the com-plexity of the diagnostic and prognostic methods. Featuresthat represent healthy, close to failure machinery and theirprogression perfectly may lead to very simple diagnostic andprognostic methods. On the other hand, very complex diag-nostic and prognostic methods using features that are ineffec-tive in representation of failure and failure progression maylead to poor results. Thus, extraction of relevant features is apre-requisite for effective diagnostics and prognostics.This paper presents an evaluation method for the goodnessof the features for prognostics. An effective feature evalua-tion method will achieve the selection of best features, whichis critical for obtaining better prognostics results. The fea-ture evaluation method is applied to bearings that were rununtil failure in a lab environment. The paper is organized

1


98


as follows: section 2 presents a brief introduction to failureprognostic, section 3 deals with the quantification metric forthe quality evaluation of features for prognostics, section 4presents results and experiments and finally section 5 con-cludes the paper.

2. FAILURE PROGNOSTIC PARADIGM

According to the International Standard Organization (ISO),failure prognostics corresponds to the estimation of the op-erating time before failure and the risk of future existence orappearance of one or several failure modes (AFNOR, 2005).In the scientific literature, the operating time before failureis called remaining useful life (RUL) for which a confidencevalue is often associated.Several methods and tools for performing failure prognosticsare proposed in the literature. This material can be groupedinto three main approaches (Tobon-Mejia, Medjaher, & Zer-houni, 2012; Heng, Zhang, Tan, & Mathew, 2009; Jardine,Lin, & Banjevic, 2006; Vachtsevanos, Lewis, Roemer, Hess,& Wu, 2006), namely: model-based approach, data-drivenapproach and hybrid approach.

Model-based (also called physics of failure) methods deal

Prognostic

Model-based approach(physics of failure) Data-driven approach Hybrid approach

Figure 1. Main prognostic approaches.

with the exploitation of a mathematical model representingthe behavior of the physical component including its degrada-tion. The derived model is then used to predict the future evo-lution of the degradation. In this case, the prognostic consistsin evolving the degradation model until a determined futureinstant from the actual deterioration state and by consideringthe future use conditions of the corresponding component.The main advantage of this approach is its precision, sincethe predictions are achieved based on a mathematical modelof the degradation. However, the derived degradation modelis specific to a particular kind of component or material, andthus, can not be generalized to all the system components. Inaddition to that, getting a mathematical model of degradationis not an easy task and needs well instrumented test-bencheswhich can be expensive.Data driven methods are concerned with the transformationof the monitoring and/or the exploitation data into relevantmodels, which can be used to assess the health state of theindustrial system and predict its future one leading to the esti-mation of its RUL. Generally, the raw data are first processedto extract features which are then used to build the diagnosticand prognostic models. The features can be temporal, fre-

quency or both. In same applications, individual features arenot sufficient and one needs to combine them in order to buildwhat can be called health indicators. Note that data-drivenprognostics methods can use data provided by sensors or ob-tained through experience feedback (operation, maintenance,number of breakdowns, etc.).The advantage of data-driven approach is its applicability,cost and implementation. Indeed, by these methods, it ispossible to predict the future evolution of degradation with-out any need of prior mathematical model of the degradation.However, the results obtained by this approach are less pre-cise than those obtained by using model-based methods.Hybrid methods use both data-driven and model-based (orphysics of failure) approaches. The application of each ap-proach depends on the application and on the type of knowl-edge and data available.

3. FEATURE EXTRACTION AND EVALUATION

Fault detection, diagnostic and prognostic activities all usethe notion of features, which are extracted from the raw mon-itoring signals provided by the sensors (temperature, vibra-tion, force, etc.) installed on the system. Feature extraction isprimordial in the process of health monitoring, health assess-ment and failure prognostic. Indeed, the relevant informationwhich is related to the behavior of the component during itsdegradation is often hidden in the raw signals and needs tobe extracted by means of appropriate methods. The figure 2shows the steps involved in the process failure prognostic in-cluding feature extraction.

Diagnostics is a classification problem, whereas the prog-

Critical component to monitor

Parameters to measure

Adequate Sensors

Data processing and feature extraction

Feature evaluation

Health assessment

and prognostics

RUL

Figure 2. Steps for RUL estimation.

nostics is the process of forecasting the future health states.The goodness of the features for diagnostics is basically ameasure of separability between data from healthy and faultyequipment. Good separability indicates that samples fromdifferent classes (i.e., healthy and faulty) are far apart fromeach other and samples from the same class are close to eachother. The key point in prognostics is the continuity of theseparation between time segments, whereas diagnostics fo-cus on one separability measure between two static classes(i.e., failed and healthy). However, prognostics searches forseparation between time segments for the whole the compo-nent where prognostics is aimed. Within class separability(parameters a and b in Fig 3 (Camci, Medjaher, Zerhouni, &Nectoux, 2012)) and between class separability (parameter cin Fig 3 (Camci et al., 2012)) are used to quantify the sepa-rability. Many class separation metrics have been reported inthe literature (Calinski & Harabasz, 1974; Eker et al., 2011).These metrics focus on static classes; do not consider pro-

2


99


Figure 3. Feature quality for diagnostics and prognostics.

gression from one class to another. One feature may be goodat separation of the classes, but not at representation of pro-gression from one class to another. For example, separabilitymeasure (S2) of feature 2 (F2) is higher than in separabilitymeasure (S1) of feature 1 (F1) in Fig 3 (Camci et al., 2012).However, this does not mean that F2 is better in representingthe failure progression. As seen from the figure, failure pro-gression in F2 involves higher variation. Thus, a new qualitymeasure should be employed for prognostics, which is a rel-atively new problem.Monotonically non-increasing or non-decreasing: Math-ematically, a function f is called monotonically increasing(monotonically non-decreasing), if for all x and y such thatx ≤ y one has f (x) ≤ f (y) (f (y) ≤ f (x)).It may be trivial to check the monotonicity for a single fail-ure progression sample by analyzing the difference betweenconsecutive points. When all the difference values are greater(less) than or equal to 0, then the function is defined as non-decreasing (non-increasing). However, monotonicity over allsamples representing failure progression should be consid-ered rather than individual analysis of samples. Example ofseveral samples representing failure progression is displayedin Fig. 4 (Camci et al., 2012). As seen from the figure, thetime is segmented for effective analysis of the failure progres-sion. The effectiveness of a feature to represent the failureprogression is calculated as the average separability of seg-ments as represented in (1). The higher the total separabilityvalue (S) is, the better representation of the failure progres-sion. Thus, the goal is to find the feature that has the highestS value. S basically is the average separation between timesegments. High S value indicates that the difference betweentime segments are high. st value is the separability measurefor consecutive time segments.

S =

T∑t=1

st

T(1)

where S is the average separability value, st is the separabil-ity at time t and T is the total number of time segments.The distribution of the data points from different samples ineach time segment should be used to measure the separabil-

t1 t2 t3 t4 t5 t6 t7

t8 t9 t10

Figure 4. Failure progression for multiple samples.

ity at a given time segment. The separability calculation isformulated in (2).

st =a

L− χ

Nt(2)

with

χ =

0 if a

L 6= 1α if a

L = 1(3)

where α is the number of samples overlapping with the dis-tribution in consecutive time frame, Nt is the number of sam-ples in time segment t and L represents the distance between25th and 75th percentiles. The 25th and 75th percentiles wereselected as a common sense to select the range to be ableto capture the 50% of the data. The selection of the rangemay depend on signal to noise ratio and possible bias in thedataset.The ratio of the length of the non-overlapped portion (calleda) to L is a measure of the separability (a/L). The L and aparameters represent the distance between points in the givenpercentile. For example, if the overlapping occurs between30th and 50th percentile, parameter a is the distance betweensamples in 30th and 50th percentile. When the separationis low, a/L ratio will be close to 0. When the separation ishigh, a/L becomes closer to 1. When there is no overlap be-

3


100


tween 25-75 percentiles of the distributions (a/L=1), thereexist two different possibilities. In the first one, there is someoverlap within data greater than 75th percentile or less than25th percentile. The second one represents complete separa-tion. When a/L becomes 1, then the ratio of number of datapoints causing overlap to the total number of data points inthe distribution is subtracted in separability calculation.

4. EXPERIMENTS AND RESULTS

4.1. Simulated Dataset

The presented evaluation method is applied to eight simulateddatasets. These datasets have been developed to simulate var-ious levels goodness for prognostics. The features with cleartrend are considered to be good feature, whereas bad featuresdo not include a trend with time. The datasets numbered fromone to eight include increasing trend as shown in Figure 5.

200 400 600 800 10002

2.5

3

3.5

4

Feature 1

200 400 600 800 10002

2.5

3

3.5

4

Feature 2

200 400 600 800 10002

2.5

3

3.5

4

Feature 3

200 400 600 800 10002

2.5

3

3.5

4

Feature 4

200 400 600 800 10002

2.5

3

3.5

4

Feature 5

200 400 600 800 10002

2.5

3

3.5

4

Feature 6

200 400 600 800 10002

2.5

3

3.5

4

Feature 7

200 400 600 800 10002

2.5

3

3.5

4

Feature 8

Figure 5. Simulated Features.

The trend in these datasets are formulated as logarithmicallyincreasing mean with constant noise and shown in theformulation below. In these equations µi,t is the mean offeature i in time t and T is the final time point.

x(t) = µi,t + σ (4)

µi,1 = log(10) (5)

µi,T = log(i× 10) (6)

As seen from the Figure 5, the goodness of the features in-creases from feature 1 to feature 8. The trend in the laterfeatures can be seen better in later datasets. Figure 6 displaysthe goodness of features obtained with the presented evalua-tion metric. As seen from the figure, the goodness increasesin the later features, which supports the increasing trend inFigure 5.

1 2 3 4 5 6 7 80

1

2

3

4

5

6

7 Goodness of features

Figure 6. Goodness of features.

4.2. Bearing Example

The accelerated bearing life test bed is called PRONOSTIA,which it is an experimentation platform dedicated to test andto validate bearing health assessment, diagnostic and prog-nostic. In the present experimental setup a natural degra-dation process of bearings is performed. During the exper-iments any failure types (inner race, outer race, ball, or cage)or their combinations could occur. This is allowed in the sys-tem to better represent a real industrial situation.The experimental platform PRONOSTIA is composed of twomain parts: a first part related to the speed variation and asecond part dedicated to load profiles generation. The speedvariation part is composed of a synchronous motor, a shaft,a set of bearings and a speed controller. The synchronousmotor develops a power equal to 1.2 kW and its operationalspeed varies between 0 and 6000 rpm. The second part iscomposed of a hydraulic jack connected to a lever arm allow-ing to create different loads on the bearing mounted on theplatform for degradation.A pair of ball bearings is mounted on one end of the shaftto serve as the guide bearings and a NSK6307DU roller ball

4


101


bearing is mounted on the other end to serve as the test bear-ings. The transmission of the movement between the motorand the shaft drive is coupled by a rub belt.Two high frequency accelerometers (DYTRAN 3035B) aremounted horizontally and vertically on the housing of the testroller bearing to pick up the horizontal and the vertical ac-celerations. In addition, the monitoring system includes onetemperature probe (of type PT100) to record the temperatureof the tested bearing. A speed sensor and a torque sensor arealso available on the PRONOSTIA platform. The samplingfrequency of the NI DAQCard-9174 data acquisition card isset to 25600 Hz and the vibration data provided by the twoaccelerometers are collected every 1 second.The bearing operating conditions are determined by instan-taneous measures of the radial force applied on the bearing,the rotation speed of the shaft handling the bearing and of thetorque inflicted to the bearing.Several features are extracted to be used for failure progres-sion such as maximum, mean, standard deviation, skewness,kurtosis, root mean square error (RMS), crest factor and high-est frequency.Fig 7 displays two good (RMS and standard deviation); twobad features (Skewness and crest factor) for prognostics (inthese plots, the x axis stands for time). As you can see fromthe figures, failure progression can be seen in the featureswith high separability measure.Fig 8 displays the separability values of several features. In

0 0.5 1 1.5 2 2.5

x 104

0.2

0.3

0.4

0.5

0.6

0.7RMS

0 0.5 1 1.5 2 2.5

x 104

0.1

0.2

0.3

0.4

0.5Standard Deviation

0 0.5 1 1.5 2 2.5

x 104

-0.4

-0.2

0

0.2

0.4

0.6Skewness

0 0.5 1 1.5 2 2.5

x 104

0

5

10

15

20Crest Factor

Figure 7. Examples of good/bad features for prognostics.

this figure three sensory signals were used each is representedby a line in the graph. The fluctuations show that the good-ness may vary based on the sensory signal used. As seen fromthis figure, the goodness of skewness and crest factor (CF) arelow, whereas the goodness of standard deviation and RMS arehigh. Thus, the evaluation method is able to differentiate the

goodness of the features.

max mean stdev skew kurtosis rms CF rms1 rms2 CF1 CF2 freq0

1

2

3

4

5Vibration Signal 1

max mean stdev skew kurtosis rms CF rms1 rms2 CF1 CF2 freq0

1

2

3

4

5Vibration Signal 2

Figure 8. Separability values for the second type of degrada-tion.

5. CONCLUSION

The quality of the features is critical for health assessment, di-agnostics and prognostics. Feature extraction, selection andevaluation of the quality of features in diagnostics has beenstudied extensively. The nature of the prognostics problemis different from diagnostics. This paper presents quantifica-tion metric for evaluation of the quality of features for prog-nostics, which is relatively new problem compared to diag-nostics. The presented metric is applied to features extractedfrom bearing vibration data collected in a lab environment.The features are plotted for visual evaluation to judge thequality of the evaluation metric. The results show that themetric is able to effectively quantify the quality of featuresfor the purpose of prognostics.

REFERENCES

AFNOR. (2005). Condition monitoring and diagnostics ofmachines - Prognostics - Part 1: General guidelines.NF ISO 13381-1.

Bozchalooi, I., & Liang, M. (2008). A joint resonance fre-quency estimation and in-band noise reduction methodfor enhancing the detectability of bearing fault signals.Mechanical Systems and Signal Processing, 22, 915-933.

Calinski, R., & Harabasz, J. (1974). A Dendrite Method forCluster Analysis. Comm. in Statistics, 3, 1-27.

Camci, F., Medjaher, K., Zerhouni, N., & Nectoux, P. (2012).Feature Evaluation for Effective Bearing Prognostics.Quality and Reliability Engineering International. (Inpress)

5


102


Davaney, M., & Eren, L. (2004). Detecting Motor BearingFaults. IEEE Instrumentation and Measurement Mag-azine, 30-50.

Eker, O., Camci, F., Guclu, A., Yilboga, H., Sevkli, M., &Baskan, S. (2011). A Simple State- based PrognosticModel for Railway Turnout Systems. IEEE Transac-tions on Industrial Electronics, 58(5), 1718-1726.

Enzo, C. L., & Ngan, H. W. (2010). Detection ofMotor Bearing Outer Raceway Defect by WaveletPacket Transformed Motor Current Signature Analysis.IEEE Transactions on Instruments and Measurement,59(10), 2683-2690.

He, W., Jiang, Z., & Feng, K. (2009). Bearing fault detectionbased on optimal wavelet filter and sparse code shrink-age. Measurement, 42, 1092-1102.

Heng, A., Zhang, S., Tan, A. C., & Mathew, J. (2009). Rotat-ing machinery prognostics: State of the art, challengesand opportunities. Mechanical Systems and Signal Pro-cessing, 23(3), 724 - 739.

Jardine, A. K., Lin, D., & Banjevic, D. (2006). A review onmachinery diagnostics and prognostics implementingcondition-based maintenance. Mechanical Systems andSignal Processing, 20(7), 1483 - 1510.

Li, Y., Billington, S., Zhang, C., Kurfess, T., Danyluk, S., &Liang, S. (1999). Adaptive Prognostics For RollingElement Bearing Condition. Mechanical Systems andSignal Processing, 13(1), 103-113.

Marble, S., & Morton, B. (2006). Predicting the RemainingUseful Life of Propulsion System Bearings. In Pro-ceedings of the 2006 IEEE Aerospace Conference. Big

Sky, MT, USA.McFadden, P., & Smith, J. (1984). Vibration monitoring of

rolling element bearings by the high frequency reso-nance technique - a review. Tribology International,17, 3-10.

Richard, A. W. (2005). A Need-focused Approach to AirForce Engine Health Management Research. In HealthManagement Research IEEE Aerospace Conference.Big Sky, Montana, USA.

Su, W., Wang, F., Zhu, H., Zhang, Z., & Guo, Z. (2010).Rolling element bearing faults diagnosis based on opti-mal Morlet Wavelet filter and autocorrelation enhance-ment. Mechanical Systems and Signal Processing, 24,1458-1472.

Tandon, N., & Choudhury, A. (1999). A review of vibrationand acoustic measurement methods for the detection ofdefects in rolling element bearings. Tribology Interna-tional, 32, 469-480.

Tobon-Mejia, D., Medjaher, K., & Zerhouni, N. (2012). CNCmachine tool’s wear diagnostic and prognostic by usingdynamic Bayesian networks. Mechanical Systems andSignal Processing, 28, 167 - 182.

Vachtsevanos, G., Lewis, F. L., Roemer, M., Hess, A., & Wu,B. (2006). Intelligent fault diagnosis and prognosis forengineering systems. Wiley.

Zhang, B., Sconyers, C., Patrick, M. O. andR., & Vachtse-vanos, G. (2010). Fault Progression Modeling: An Ap-plication to Bearing Diagnosis and Prognosis. In Pro-ceedings of American Control Conference. MD USA.

6


103

Finite Element based Bayesian Particle Filtering for the estimation

of crack damage evolution on metallic panels

Sbarufatti C.1, Corbetta M.

2, Manes A

3. and Giglio M.

4

1,2,3,4Politecnico di Milano, Mechanical Dept., Via La Masa 1, 20156, Milano, Italy

[email protected]

[email protected]

[email protected]

[email protected]

ABSTRACT

A lot of studies are nowadays devoted to structural health

monitoring, especially inside the aeronautical environment.

In particular, focusing the attention on metallic structures,

fatigue cracks represent both a design and maintenance

issue. The disposal of real time diagnostic technique for the

assessment of structural health has led the attention also

toward the prognostic assessment of the residual useful life,

trying to develop robust prognostic health management

systems to assist the operators in scheduling maintenance

actions. The work reported inside this paper is about the

development of a Bayesian particle filter to be used to refine

the posterior probability density functions of both the

damage condition and the residual useful life, given a prior

knowledge on damage evolution is available from

NASGRO material characterization. The prognostic

algorithm has been applied to two cases. The former

consists in an off-line application, receiving diagnostic

inputs retrieved with manual structure scanning for fault

identification. The latter is used on-line to filter the input

coming from a real-time automatic diagnostic system. A

massive usage of FEM simulations is used in order to

enhance the algorithm performances.

1. INTRODUCTION

Fatigue crack nucleation and propagation is a major issue

when considering aeronautical structures, both from a

design (Schmidt & Schmidt-Brandecker, 2009) and

maintenance points of view (Lazzeri & Mariani, 2009).

From one hand, a proper design is required in order to

guarantee the structure damage tolerance or the safe life,

depending on the criticality of the selected component.

From the other hand, a strict inspection schedule has to be

programmed in order to guarantee structural health, due to

the uncertainties in the design assumptions for damage

nucleation and evolution (material non-uniformities,

manufacturing tolerances, not easily predictable load

spectrum, uncertainty in stress field knowledge in hot spots,

etc.). Moreover, maintenance stops often require

dismounting large portions of structure, thus reducing the

availability of the aircraft and raising the operative costs.

Real time Structural Health Monitoring (SHM), as part of a

complete Prognostic Health Management system (PHM),

could potentially reduce the aircraft operative costs, while

maintaining a high level of safety (Boller, 2001). A lot of

research is thus directed to the development of systems for

automatic fault detection, able to perform a continuous on-

board inference on structural health. The evolution of

Diagnostic Monitoring Systems (DMS) has led to the

recognition that predictive prognosis is both desired and

technically possible. As a matter of fact, the availability of a

huge amount of data coming from DMS, once statistically

treated, would allow for a stochastic estimation of the

structure Residual Useful Life (RUL) as well as for the

estimation of the Probability Density Function (PDF)

relative to the current damage state. The approach would

allow deciding in real time whether a component must be

substituted or repaired, according to some predefined safety

parameters.

Bayesian updating methodologies perfectly fit the PHM

target (Arulampalam, Maskell, Gordon & Clapp, 2002).

Their approach consists in updating the a priori information

on RUL (based essentially on material characteristics)

according to the actual observations (treated stochastically)

taken in real time by the DMS, thus coming to the

estimation of the posterior required distributions,

conditional on the measures. Unfortunately, it is impossible

to analytically evaluate these posterior distributions apart

from the cases when the degradation process is linear and

the noise is Gaussian (like happens when using Kalman

Filters). Focusing on fatigue damage, being crack evolution

_____________________

Sbarufatti C. et al. This is an open-access article distributed under the terms

of the Creative Commons Attribution 3.0 United States License, which




104


2

not a linear process and all the involved uncertainties

(comprehending also the measure error) not Gaussian, a

numerical approach is suggested. Monte Carlo Sampling

(MCS) methods are a valid tool to approximate the required

posterior distributions (Cadini, Zio & Avram, 2009).

Among them, Particle Filters, also known as Sequential

Importance Sampling (SIS) are a MCS method taking its

name from the fact that the continuous distributions of

interest are approximated by a discrete set of weighted

particles, each one representing a Markov Process trajectory

of evolution in the state space, being its weight an index of

probability of the trajectory itself (Arulampalam et al.,

2002). It is however important to consider that, though as

the number of samples becomes very large, the MCS

characterization of the PF approaches the optimal Bayesian

estimate. In addition, Sequential Importance Resampling

(SIR) algorithm is a similar technique which allows for

particle resampling when the initially drawn samples are not

able to describe with sufficient accuracy the system

dynamics. In this case, new particles are usually sampled

taking into account the information about the system gained

up to the resampling instant.

It is however important to consider the two main differences

raising when considering real time DMS based upon a

network of sensors installed over the structure with respect

to classical Non Destructive Technologies (NDT) used to

manually scan the structure during maintenance stops

(scheduled or unscheduled). The first point is related to the

target damage dimension that can be identified. NDTs can

detect cracks at a very early stage of propagation, often

detecting anomalies in the length order of 1mm or less. On

the other hand, the on-board DMS is expected to be

designed for a longer target crack length (typically an order

of magnitude greater, however strictly dependent on the

allowed number and position of sensors as well as on the

geometry of the structure that is going to be monitored), like

reported by Sbarufatti, Manes and Giglio (2011). This is

however in compliance with actual specification

requirements for damage tolerance (JSSG, 2006), at least for

the aeronautical panel structure which is going to be tested

inside this framework (Figure 1). The second point concerns

the uncertainty related to the provided measure. Obviously,

the variance of damage inference that can be obtained with a

manual scan over the entire structure is by far more precise

with respect to the PDF of the damage state estimated with a

smart sensor network, due to the complicated algorithms for

data fusion and damage characteristic evaluation.

The work reported inside this paper is about the

development and testing of a Particle Filtering algorithm for

the prognosis of aeronautical stiffened skin panels. The aim

of the work is to appreciate the advantages due to the

application of PF for the estimation of RUL, as a

comparison with a classical methodology for the estimation

of fatigue crack evolution. Moreover, this work represents

the final testing of a complete PHM system that also

comprehends an automatic DMS for the real time evaluation

of damage. A real dynamic crack propagation test has been

executed, with acquisition from a network of 20 FBG strain

sensors (Figure 1), with contemporaneous manual crack

length track. A detailed and validated Finite Element model

of the structure under monitoring has been developed and

used in a massive way inside both the DMS and the PF

algorithm. PF has been applied separately to two cases. The

former, namely off-line PHM, consists in providing as input

for the PF the crack lengths manually recorded (with an

hypothesis of the associated distribution). Concerning the

second case, namely on-line PHM, as anticipated, the output

of the real time DMS (processing the signal from the sensor

network) is given as input to the PF algorithm. The two

approaches have been compared, providing some comments

on relative performances. To be noticed that the present

article is focused on the prognostic part of the SHM, while

the interested reader could refer to the work of Sbarufatti,

Manes and Giglio (2012) for a detailed description of the

DMS design and performances (taken as input for the

current paper).

In particular, a brief overview of PF theory is provided in

section 2 of the present paper, followed by a description of

the stochastic crack propagation model and the

measurement model, respectively presented in sections 3

and 4. The PF theory has been tested for the off-line and on-

line PHM, reporting results inside section 5. A conclusive

section is also provided.

2. OVERVIEW OF PARTICLE FILTER THEORY

When modeling the behavior of dynamic systems under

degradation, at least two models are required (Cadini et al.,

2009). Firstly, a model describing the sequential evolution

of the state (or the system model) and, second, a model

relating the noisy measurements to the state (or the

measurement model). The former consists of a hidden

Markov process describing the health state ; 1:,

Figure 1. (a) Test rig for dynamic crack propagation test

starting from a notch artificially initiate on the aluminum

panel structure. (b) Typical aeronautical stiffened skin panel

structure with sensor network for diagnosis installed (20

FBG strain sensors)


105


3

or the Transition Density Function (TDF) f that relates the

health state at time k-1 to the condition at instant k. It

consists in a Discrete time State Space (DSS) model. The

latter is the equation describing the distribution of the

observations ; 1:, or the statistical function h that

relates the condition of the monitored component to its

noised measure at time stamp k. In a Bayesian framework,

all the relevant information about the state can thus be

inferred from the posterior distribution of the state xk, given

the history of collected measurements y1…k. This is true also

concerning Particle Filters, apart from the fact that the

posterior distributions are estimated by means of MCS from

f and h. What follow are the basic steps of the mathematical

formulation of PF theory, while for a deeper description the

interested reader could refer to a tutorial on particle filter

theory (Arulampalam et al., 2002). The DSS and

measurement models will be thoroughly defined inside the

following section.

Given the stochastic damage evolution can be described

through the TDF, the aim of the PF is the selection of the

most probable damage state xk at current time k (or in

alternative the entire damage state history up to k),

according to the noisy measurements that have been

collected up to the current discrete time k. This means

estimating the posterior PDF of the health state at k, like

reported in Eq. (1), which is valid for the entire state

sequence up to k.

p :|: p :|:δ : − :d: (1)

Equation (1) indicates that the posterior PDF of the health

state can be expressed as an integral inside the space of all

possible damage evolutions : , where only those

propagations similar to the target evolution : give

contribution. According to MCS theory, the integral could

be solved by sampling : from the true posterior PDF

p :|:. Unfortunately, this is not possible, being that

distribution the objective of the inference. Thus, SIS-SIR

technique is a well-established method to overcome this

problem. The method allows generating samples from an

arbitrarily chosen distribution called Importance Density

Function (IDF) :|:, allowing to rewrite Eq. (1) in

the form of Eq. (2), without applying any bias to the

required p :|:.

p :|:

q :|: p :|:q :|: δ : − :d: (2)

An estimation of Eq. (2) can be derived through MCS

(based on q distribution), thus coming to Eq. (3), where

: , i 1,2, … , N is a set of Ns independent random

samples (particles) drawn from q :|: and δ is the so

called Dirac delta function. Finally, w

are the importance

weights calculated as the ratio between p and q

distributions, each one relative to the ith

particle (possible

propagation history) and valid for the kth

discrete instant.

p :|: 1N

! w∗ δ#: − :

$%&

'( (3)

Equation (3) expresses the required posterior PDF as a

combination of the weights associated to each particle (or to

each damage propagation sample). After some mathematical

transformations available in literature (Arulampalam et al.,

2002), one could express w∗

as a recursive formula

dependent on the weights that have been calculated at

previous discrete time k-1, as reported inside Eq. (4), where

w

are called Bayesian Importance Weights and are

calculated like in Eq. (5).

w w)(

p#y+x $p#x

+x)( $

q#x +:)(

, :$

(4)

w w

∗ p : (5)

Inside Eq. (4), p#x +x)(

$ is the TDF (f) indicating the

statistical correlation between two consecutive steps of

damage evolution. Moreover, p#y+x $ is the probability of

having a certain measure at k, given a state sample is

considered among the particles propagated up to k. This is

available once the measurement model (h) is statistically

described, like described inside section 4. Finally,

q#x +:)(

, :$ is the IDF from which one has to sample

in order to generate particles, or the random Markov Process

describing the damage evolution, which can be arbitrarily

selected.

The choice of IDF distribution is a crucial step for the PF

algorithm design. In fact, the algorithm convergence is

mathematically demonstrated to be independent from the

choice of IDF given a sufficient number of samples is

generated. If the allowed number of samples is limited, due

to computational requirements, the algorithm performances

are dependent on the choice of the importance density

function. However, as a first approximation, it is often

worth trying to select the IDF equal to the TDF (Bootstrap

approximation (Haug, 2005)). This would allow for a strong

complexity reduction of Eq. (4) as IDF and TDF will be

simplified. This means generating particles according to the

prior knowledge on material properties (however

statistically defined), then updating weights identifying the

most suitable samples according to the measure distribution

and history. Nevertheless, it could happen that the real

propagation that is measured behaves like an outlier with

respect to the stochastic damage propagation, thus forcing


106


4

almost all the particle weights to zero. When this happens,

resampling of particles is required, from a different IDF,

somehow taking into account the history of measurement

collected up to the resampling instant.

Finally, once the health state PDF is approximated assigning

an importance weight to each particle, also the distribution

of the Failure Cycle (Nf) can be updated and refined,

conditioned on the health state, like expressed in Eq. (6),

thus allowing for the estimation of the updated RUL

distribution.

-#.+/:0$ 11

! 20∗ 34 . − .,0

356

3'( (6)

3. THE DISCRETE TIME STATE SPACE MODEL

DSS is the model describing the a priori knowledge of

probabilistic damage evolutions (particles). In other words,

it represents the possibilities for damage evolution (given

the uncertainties in material characterization as well as the

noise inevitably present inside the operating environment),

from which the algorithm selects the samples that best fit

with the measures. The model used inside the current

framework for damage propagation is based on the

NASGRO Eq. (7), though other less complicated models

such as Forman law or Paris equation (Budynas & Nisbett,

2006) have been usually adopted in literature for crack

propagation prognosis (Cadini et al., 2009). NASGRO law

allows describing not only the stable crack propagation, but

also damage initiation and the unstable crack evolution. It

also takes into account the load ratio (R) of the applied

spectrum, defined as the ratio between the valley and peak

values of the load cycle, as well as the crack closure effect

induced by plasticity near the crack tips.

787 9 ∙ ;<1 − =

1 − >? ∙ ∆ABC

∙ D1 − ∆AEF∆A GH

D1 − ∆ACIJAK GL (7)

Inside Eq. (7), 8 is the crack dimension and 78 7⁄

represents the crack growth rate per cycle (N). ΔK is the

variation of the Stress Intensity Factor (SIF) inside one load

cycle, calculated as the difference between the SIFs

evaluated in correspondence of the maximum and minimum

load. Moreover, ∆Kth is the threshold variation of SIF (crack

shouldn’t propagate below ∆Kth), Kc is the critical value of

SIF (fracture toughness) and f is the crack opening function.

Finally, C, m, p and q are parameters defined for material

characterization. The interested reader could refer to

NASGRO reference manual (2005) for a deeper insight to

the parameter definition.

Equation (7) allows calculating the crack growing rate as a

function of the applied load cycle, given the needed constant

are defined. Some comments arise relative to the work

presented hereafter. First of all, to develop a methodology as

general as possible, SIFs have not been calculated with

simple analytical formulas (usually valid for simple skins).

A large database of FEM simulated damages has been

generated, collecting SIF parameters for each case. An

Artificial Neural Network has been trained in order to fit the

function that relates the crack position and dimension to the

SIF at crack tips. The method would allow evaluating crack

propagation also for complex geometries, obviously given a

validated FEM is available (the subject of current

monitoring is an aluminum skin, stiffened through some

riveted stringers, with crack propagating on the skin).

Moreover, Eq. (7) has been stochastically described by

means of some experimental data available in literature

[Giglio & Manes, 2008]. In particular, C and m parameter

distributions have been derived from a crack propagation

test campaign made on aluminum structures. While

simulating crack propagation with Eq. (7), C and m are

randomly sampled at each step of crack evolution, thus

obtaining a model that relates the health state at discrete

instant k-1 to the condition at k, or the Transition Density

Function shown in Eq. (8). A Gaussian noise has also been

introduced, like described by Cadini at al. (2009).

p x|x)(, ∀k R 0 (8)

Thus, the probabilistic a priori information on damage

evolution is shown inside Figure 2, where the real crack

propagation (over structure presented in Figure 1) is

reported together with the random Markov Process

evolution of the simulated damage. In particular, the initial

Figure 2. NASGRO DSS model for off-line PHM.

Comparison of particles with real crack propagation

measured during experiments. Particles have been

generated starting from a 16mm measure, corresponding

to the length of the artificially initiated crack.


107


5

crack length has been set to 16mm, corresponding to the

artificial notch introduced to fasten crack nucleation and to

control crack position. As one could notice, the random

simulated crack propagation covers a very wide range of

possibility, including also the real case measured during

test. An efficient algorithm (based on probability theory) is

thus needed in order to select which are the particles that

best fit the reality, given some measures (with noise and

uncertainty) have been taken, thus reducing the uncertainty

on the RUL estimation. The DSS model presented inside

Figure 2 will be adopted when considering the application

of PF to the off-line PHM system (measurements are

manually collected during maintenance stops). On the other

hand, Figure 3 shows the stochastic simulation of crack

propagation for the on-line case (measures of crack length

are estimated by a sensor network installed over the

structure). Simulated crack propagation has been initiated

after the anomaly detection is performed by the automatic

diagnostic system (about 60mm for the sensor network Vs.

damage configuration shown in Figure 1). The first thing to

be noticed is the reduced dispersion of particles in Figure 3

with respect to Figure 2, being the model initiated in

correspondence of a longer crack length. Moreover, the

random process of simulated crack propagation appears to

be centered on the real damage evolution in Figure 3, where

the randomness of damage evolution from 16mm to about

60mm has not been considered.

4. THE MEASUREMENT SYSTEM

Two measurement systems have been adopted, trying to

analyze the PF algorithm performances when off-line and

on-line PHMs are going to be considered (Figure 4).

Off-line PHM simulates the case when the aircraft is

stopped for maintenance and the structure is manually

scanned by operators for crack identification. In the case a

damage tolerant structure is considered, the aim is to

identify if it is possible to postpone dismounting and

repairing until the prognostic system declares a critical

condition. In order to statistically characterize the off-line

measure, it has been decided in first approximation to

consider the measurement system PDF Gaussian, with mean

value equal to the real crack length (measured with a caliber

during the real test). Nevertheless, a standard deviation (σoff)

has also been selected so that the 95% confidence band is

inside the ±3% range with respect to the measure.

On the other hand, the on-line PHM simulates the case when

the structural health condition is automatically inferred by

means of a diagnostic unit that processes data coming from

a smart sensor network. The concept consists in maintaining

the aircraft operative until the PHM system declares further

operations unsafe, given a predefined safety parameter. The

diagnostic unit used inside the current framework has been

thoroughly described by Sbarufatti et al. (2012). It basically

consists of two Artificial Neural Networks (ANN), trained

with FEM simulations in order to understand the complex

functions that relate the damage parameters (existence,

position and length) to the strain field modifications due to

damage. The first ANN (anomaly detection algorithm)

receives strain data as input and generates an alarm when

the damage index (ranging from 0 to 1) falls above 0.5. The

second algorithm (damage quantification), activated in

series to the anomaly detection, receives again strain data

and gives crack length distribution1 as output (a deeper

explanation about diagnostic unit output is again provided

by Sbarufatti et al. (2012)).

1 The quantification algorithm is composed by 50 ANNs,

trained with randomly selected damage samples (with

random position and length). Each one receives the strain

pattern from the FBG acquisition system and returns an

estimation of crack length.

Figure 3. NASGRO DSS model for on-line PHM.

Comparison of particles with real crack propagation

measured during experiments. Particles have been

generated starting from a 60mm measure, corresponding

to the length of the crack in correspondence of the

anomaly detection by the automatic diagnostic unit.

Figure 4. Comparison between (a) the Off-Line PHM

procedure and (b) the On-Line PHM process. The On-

Line process is based upon the diagnosis performed

through an on-board SHM system that detects and

characterizes structural faults.


108


6

The PF algorithm is thus activated after the anomaly is

detected and an estimation of the damage state distribution

is provided from the diagnostic algorithm.

A comparison of the on-line vs. off-line measurement

system is provided in Figure 5. It can be noticed that the

±2σ-band adopted to simulate the behavior of a generic

system for manual surface scan is by far narrower with

respect to the uncertainty correlated to the real-time

automatic diagnostic system. For instance, considering a

70mm target crack length, the ±2σ-band ranges between

63mm and 86mm for the on-line diagnosis, while ranging

between 67.5mm and 72.5mm for the off-line measure.

However, it can be noticed that the average value of the

quantification distribution correctly estimate the target crack

length. The strong degeneracy for the σ-band of the on-line

measure of longer cracks is due to the fact that the database

of simulated experience used to train the ANN algorithms

for diagnosis has been limited up to 100mm cracks.

5. COMPARISON OF ON-LINE VERSUS OFF-LINE RESULTS

The performances of the PF algorithm when applied to the

two maintenance approaches introduced above are now

deeply investigated. The main output of the PF probabilistic

calculation is the estimation of the health condition of the

structure, like reported inside Figure 6 relatively to both off-

line and on-line PHM. In few words, the main advantage of

the PF technique is that it allows to update the posterior

PDF for the damage condition, taking into account the

history of all the measures taken up to the kth

discrete time

instant, as well as the analytical a priori knowledge given by

the underlying model for damage evolution. This becomes

particularly attractive when autonomous diagnostic systems

are considered. As a matter of fact, they could provide

continuous information relative to damage existence and

level; nevertheless they are characterized by a robustness

and precision inferior with respect to classical NDT

technologies (herein simulated with off-line measures). In

practice, PF could filter the most suitable states at kth

instant, inside the database of possible damage evolutions

(particles) calculated a priori with respect to any measure.

Particles relative to the off-line and on-line PHM have been

shown in Figure 2 and Figure 3 respectively. Once the

actual state distribution is updated and refined, the

distribution of the RUL could also be updated, becoming

conditional on the whole history of the monitored

component, and consistent with the analytical and empirical

knowledge which is inside the TDF.

The state posterior PDF estimation is shown inside Figure 6,

relatively to the off-line (Figure 6(a)) and on-line (Figure

6(b)) cases. PF has been applied to a real crack propagation

test, with contemporaneous manual acquisition of crack

Figure 5. Measurement system uncertainties. Comparison

of the on-line diagnostic system performance with respect

to the off-line manual structural scan methodology. The

on-line diagnostic system has been trained with FEM

damage simulations, with crack length up to 100mm.

Figure 6. Filtering of the health state distribution. (a) Posterior PDF of the health state for the off-line measure and (b)

Posterior PDF of the health state for the on-line structural diagnosis. The real crack propagation is shown, as well as the

collected measures. The posterior 95% σ-band is also plotted, to be compared with the a priori σ-band reported inside

Figure 5. The instants when the algorithm required particle resampling have also been indicated.


109


7

length measures (processed in Figure 6(a)) and automatic

estimation of crack measure by means of an on-board smart

sensor network based upon strain field (processed in Figure

6(b)). It is immediately clear that, while the manual

structure scan would allow to detect and to measure shorter

cracks (the inferior limit is imposed herein by the length of

the artificial damage for crack initialization, set to 16mm),

the anomaly detection threshold for the sensor network and

damage configuration reported in Figure 1 is around 60mm.

On the other hand, off-line measures are available at

predefined scheduled intervals, while the on-line health

assessment is retrieved in continuous every 1000 load cycles

through the diagnostic unit developed by Sbarufatti et al.

(2012). However, on-line measures are affected by a large

uncertainty if compared to the off-line case, like described

into Figure 5.

Concerning the off-line PHM system, the health state

estimation (Figure 6(a)) appears to characterize precisely the

damage evolution, being the 95% σ-band mostly centered

on the real damage condition. However, it is clear from

Figure 2 that the damage evolution occurred during the test

is not centered with respect to the stochastic model used to

define the TDF. This resulted in resampling requirement

after few updating iterations, as the available particles were

not enough to describe the posterior PDF of the health state

(only few particles retains a weight which is significantly

different from zero).

Relating to the on-line PHM system, it can be noticed that

the posterior PDF of the health condition is by far narrower

with respect to the output of the diagnostic algorithm shown

inside Figure 5. For instance, relatively to a 70mm crack,

the 95% σ-band of the quantification algorithm (Figure 5)

ranges from 63mm to 86mm, while after the PF updating

process it ranges from 68mm to 72.5mm (Figure 6(b)).

However, the estimated σ-band sometimes doesn’t

comprehend the real state evolution. This is mainly due to

the fact that the measures are affected by a higher error

(with respect to the off-line system), which is in part

confirmed by the evolution of some stochastic particles.

This means that, if a lot of measures over/underestimate the

real damage condition and their assumptions are also

confirmed by the DSS model, the PF precision will

decrease. However, under the reasonable assumption

(Figure 5) that the measure PDF is centered on the target,

the PF inference will converge toward the real damage

evolution. In other words, PF tends to interpolate the

measures, nevertheless taking into account the a priori

knowledge which is inside the DSS model. Though the DSS

model used for the a priori description of the damage

evolution for the on-line PHM results centered on the real

crack propagation (Figure 3), particle resampling was also

required, due to the fact that the updating process focused

on a particular set of particles.

Some specifications are required concerning the adopted

resampling technique. As a matter of fact, the DSS model

used to initialize the algorithm has been kept as general as

possible (considering the distribution of material parameters

inside the NASGRO law), in order to be representative of

many experimental tests for crack propagation on the same

material (aluminium). The resulting DSS spreading is high,

thus provoking premature particle degeneracy and

requirement for resampling. Nevertheless, if a sufficient

number of iterations have been concluded, it is possible to

generate new particle samples from a different importance

density q#x +:)(

, :$ , taking now the history of

measures into account but preventing from the possibility to

adopt Bootstrap approximation. Concerning the work herein

reported, new particles are generated considering a TDF

with deterministic material parameters (C and m are now

obtained by fitting the specific measures taken relatively to

the specimen under monitoring) and random white noise.

From one hand, this would allow to reduce the uncertainty

Figure 7. Effect of NASGRO parameter dynamic fitting. A sudden (unpredicted) change in the slope of the crack

propagation curve cannot be described before it has happened.


110


8

related to prognosis. From the other hand, like described

into Figure 7 (where the noise has been eliminated for just

description purposes), this method is less robust to

unexpected changes in the system dynamics. It is clear from

Figure 7 that, if C and m are considered to be deterministic,

they cannot take into account for sudden changes in the

curve slope (Figure 7(a)), unless a new resampling is

executed fitting the propagation curve with new measures

(Figure 7(b)). The effect is visible in the RUL estimation,

relative to the off-line PHM case (Figure 9(a)); the error in

RUL estimation with PF increases after resampling is

executed at 250000 load cycles, until a new resampling is

executed at about 300000 load cycles, taking into account

the unexpected change in the crack evolution slope.

Once the PDF of the health state is filtered by the PF

algorithm, also the RUL of the component under monitoring

can be updated according to Eq. (6). In order to appreciate

the advantages and drawbacks of the PF algorithm, it has

been compared with a second technique. The method

consists in evaluating the RUL PDF by performing a

stochastic crack propagation based on the NASGRO law. In

few words, given the PDFs of the material related constants

are provided, 3000 crack propagations (particles) have been

simulated, sampling at each step the material constants from

the available distributions. Once the target crack length is

identified (120mm have been selected as limit crack length,

due to the limits of the FEM database), the RUL can be

stochastically defined with a PDF. The same procedure is

repeated each time a new estimation of the crack length is

provided either from the on-line or the off-line diagnostic

system. To be noticed that this method just depends on the

last measure provided by diagnostic and doesn’t take into

account the trend of historical measures (which is, on the

contrary, the advantage of PF). Each inference is thus

completely uncorrelated to the previous ones. Moreover, it

requires simulating many crack propagation every time a

new RUL PDF is needed. Stochastic NASGRO (SN) and

Particle Filter RUL evaluations are respectively reported in

Figure 8 and 9, again relatively to the on-line and off-line

Figure 8. 95% σ-band for RUL estimation with stochastic NASGRO law. Comparison of off-line PHM (a) versus on-line

PHM (b).

Figure 9. 95% σ-band for RUL estimation with Particle Filtering algorithm. Comparison of off-line PHM (a) versus on-line

PHM (b).


111


9

PHM. The estimated RUL (intended hereafter as the

remaining number of cycles before reaching the 120mm

long crack) is reported during the component life (as a

function of load cycles). The real RUL is shown as well as

its estimation calculated with SN law (Figure 8) and PF

(Figure 9). In particular, the expected value of the RUL PDF

has been reported, as well as the 95% σ-band. The first thing

to be pointed out is that SN only depends on the knowledge

of material properties (and applied load); for this reason, if a

discrepancy between the DSS and reality is present at the

beginning, there won’t be an updating process on the basis

of the collected measures, thus maintaining the same error

during life, as clearly appreciable from Figure 8(b).

Moreover, the SN prognosis is very sensitive to the quality

of the measure, being an issue especially when the on-line

PHM is considered, where the inevitable fluctuations in the

inference on structural condition (due to the high level of

uncertainties) will be reflected in an unstable prognosis

(Figure 8(b)). On the other hand, PF technique is able to

filter these uncertainties (Figure 9(b)), thus estimating a

RUL which is dependent on the entire trend of measures

that have been collected since the anomaly is identified. The

variance of the RUL PDF evaluated with the two prognosis

methods appears to be of the same order, unless resampling

is performed in PF algorithm. As explained above, the

information retrieved from the collected measures would

allow decreasing significantly the uncertainty in prognosis

(as at least the uncertainty related to material properties can

be by far reduced). This is well reflected in Figure 9(b)

where an important reduction in the variance of PF

estimation of RUL is obtained. After 275000 load cycles,

only few particles remained with a non-negligible weight,

thus provoking degeneracy of the algorithm. New particles

have thus been generated, nevertheless without considering

the material uncertainty inside the DSS (C and m parameters

inside the NASGRO equation are deterministic and obtained

through a non-linear fitting of the historical data available

up to resampling instant). Nevertheless, the resampling

technique has to be improved in order to avoid focusing in a

too narrow region inside the DSS. In fact this is the reason

for the deviation of the estimated RUL PDF from real RUL

inside Figure 9(a), like described in Figure 7.

Finally, two comments arise while comparing off-line

versus on-line PHM. Firstly, the 95% σ-band of the RUL

based on the off-line measure is narrower due to the more

precise measuring system. Nevertheless, the disposal of a

real-time diagnostic tool would increase the availability of

data relative to the health state, thus reducing the time

needed to the PF algorithm to converge on the correct

estimation.

6. CONCLUSIONS

A Particle Filtering (PF) Bayesian updating technique has

been used inside this framework for the dynamic estimation

of component Residual Useful Life. Two applications have

been compared. The first one consists in applying particle

filters to a Condition Based Maintenance where the

structural health monitoring (SHM) has been off-line

performed by maintenance operators. The second one

consists in an automatic SHM performed on-board by a

diagnostic unit trained with Finite Element damage

simulation to recognize crack damage existence and length,

based upon strain field measure. The methodology has been

tested in laboratory on a specimen representative of a typical

aeronautical structure, constituted by a skin, stiffened

through some riveted stringers. Though the uncertainty

related to the on-line structural diagnosis is by far larger

than the one associated to the off-line measure, PF

algorithm proved to correctly describe the posterior RUL

distribution (conditional on the measures) in both cases. The

additional uncertainty in the on-line measures resulted to be

compensated by the availability of a continuous measure,

thus allowing the algorithm to reach convergence in a

relatively inferior time. PF algorithm has also been

compared to a simpler technique based upon stochastic

NASGRO (SN) law propagation. The advantage of PF with

respect to SN is that it takes into account the whole history

of measures taken on the monitored component as well as

the prior knowledge coming from the propagation model.

This results in a more robust and precise estimation of the

health state as well as of the RUL PDF. Finally, the

adoption of a robust filtering methodology that merges the

information coming from a wide sensor network with the

numerical or analytical knowledge about the phenomenon

subject of monitoring appears to be a suitable technique for

the performance increase of automatic SHM systems, thus

leading toward the real on-board PHM.

NOMENCLATURE

ANN Artificial Neural Network

DMS Diagnostic Monitoring System

DSS Discrete State-Space

FBG Fiber Bragg Grating

FEM Finite Element Model

IDF Importance Density Function

MCS Monte-Carlo Sampling

NDT Non Destructive Technology

PDF Probability Density Function

PF Particle Filter

PHM Prognostic Health Management

RUL Residual Useful Life

SHM Structural Health Monitoring

SIF Stress Intensity Factor

SIR Sequential Importance Resampling

SIS Sequential Importance Sampling

SN Stochastic NASGRO

TDF Transition Density Function


112


10

REFERENCES

Arulampalam, S., Maskell, S., Gordon, N. & Clapp, T.

(2002), A tutorial on particle filters for online

nonlinear/non-Gaussian Bayesian tracking, IEEE

Transactions on Signal Processing, 50(2): 174-188.

Boller, C. (2001), Ways and options for aircraft structural

health management, Smart Materials & Structures, 10:

432-440.

Budynas & Nisbett (2006), Shigley’s Mechanical

Engineering Design, VIII edition, McGraw-Hill

Cadini, F., Zio, E. & Avram, D. (2009), Monte Carlo-based

filtering for fatigue crack growth estimation,

Probabilistic Engineering Mechanics, 24: 367-373.

Giglio, M. & Manes, A. (2008), Crack propagation on

helicopter panel: experimental test and analysis,

Engineering fracture mechanics, 75:866-879.

Haug A.J. (2005), A tutorial on Bayesian Estimation and

Tracking Techniques Applicable to Nonlinear and Non-

Gaussian Processes, MITRE technical report, Virginia.

JSSG-2006, Joint Service Specification Guide – Aircraft

Structures, Department of USA defence.

Lazzeri, L. & Mariani, U. (2009), Application of Damage

Tolerance principles to the design of helicopters,

International Journal of Fatigue, 31(6): 1039-1045.

NASGRO reference manual (2005), Fracture Mechanics and

Fatigue Crack Growth Analysis Software, version 4.2.

Sbarufatti, C., Manes, A. and Giglio, M. (2010), Probability

of detection and false alarms for metallic aerospace

panel health monitoring, Proc. 7th Int. Conf. on CM &

MFPT, Stratford Upon Avon, England.

Sbarufatti, C., Manes, A. & Giglio, M. (2011), HECTOR:

one more step toward the embedded Structural Health

Monitoring system, Proc. CEAS 2011, Venice, Italy.

Sbarufatti, C., Manes, A. & Giglio, M. (2011), Advanced

Stochastic FEM-Based Artificial Neural Network for

Crack Damage Detection, Proc. Coupled 2011, Kos,

Greece.

Sbarufatti, C., Manes, A. & Giglio, M. (2011), Sensor

network optimization for damage detection on

aluminum stiffened helicopter panels, Proc. Coupled

2011, Kos, Greece.

Sbarufatti, C., Manes, A. & Giglio, M. (2012), Diagnostic

System for Damage Monitoring of Helicopter Fuselage,

Proc. EWSHM 2012, Dresden, Germany.

Schmidt, H.J. & Schmidt-Brandecker, B. (2009), Design

Benefits in Aeronautics Resulting from SHM,

Encyclopedia of Structural Health Monitoring.

BIOGRAPHIES

Claudio Sbarufatti was born in Milan, Italy, on May 15,

1984. He received his Master of Science Degree in

Mechanical Engineering in 2009 at Politecnico di Milano,

Italy. He developed his MD thesis about rotor dynamics and

vibration control at Rolls Royce plc. (Derby, UK). At now,

he works in the Mechanical Department of Politecnico di

Milano, where he is going to conclude his Ph.D. in 2012.

The title of his Ph.D. thesis is “Fatigue crack propagation on

helicopter fuselages and life evaluation through sensor

network”. His research fields are the development of

structural health monitoring systems for diagnosis and

prognosis, Finite Element modeling, design and analysis of

helicopter components subject to fatigue damage

propagation, artificial intelligence applied to structural

diagnosis, Bayesian statistics, Monte-Carlo methods, sensor

network system design.

Matteo Corbetta was born in Cantù, Italy, on April 11,

1986. He received the Bachelor of Science degree in

Mechanical Engineering from Politecnico di Milano in

2009. He is going to receive the Master of Science in

Mechanical Engineering in 2012 at Politecnico di Milano.

At now he works in Mechanical Department of Politecnico

di Milano in the field of Structural Health Monitoring. His

current research interests are fracture mechanics and

probabilistic approaches for prognostic algorithms.

Ph.D. Andrea Manes was born in La Spezia, Italy, on

August 11, 1976. He is an Assistant Professor of

Mechanical Design and Strength of Materials, and works in

the Department of Mechanical Engineering at Politecnico di

Milano, Italy. His research fields are mainly focused on

structural reliability of aerospace components using a

complete research strategy based on experimental tests,

numerical models and material characterization. Inside this

framework several topics have been investigated: novel

methods for SHM application, methods of fatigue strength

assessment in mechanical components subjects to multiaxial

state of stress, design and analysis of helicopter components

with defects, ballistic damage and evaluation of the residual

strength, assessment of sandwich structures subjected to low

velocity impacts. He is the author of over 70 scientific

papers in international journals and conferences and is a

member of scientific associations (AIAS, Italian Association

for the Stress Analysis, IGF, Italian Group Fracture, CSMT,

Italian safety commission for mountaineering).

Marco Giglio was born in Milan, Italy, on November 1,

1961. He is an Associate Professor of Mechanical Design

and Strength of Materials, and works in the Department of

Mechanical Engineering at Politecnico di Milano, Italy. His

research fields are novel methods for SHM application,

methods of fatigue strength assessment in mechanical

components subjects to multiaxial state of stress, design and

analysis of helicopter components with defects, ballistic

damage and evaluation of the residual strength. He is the

author of over 100 scientific papers in international journals

and conferences and is a member of scientific associations

(AIAS, Italian Association for the Stress Analysis, IGF,

Italian Group Fracture).


113

Health Assessment and Prognostics of Automotive Clutches

Agusmian Partogi Ompusunggu1,2, Steve Vandenplas1, Paul Sas2, and Hendrik Van Brussel2

1 Flanders’ MECHATRONICS Technology Centre (FMTC), 3001 Heverlee, [email protected]

[email protected]

2 Katholieke Universiteit Leuven, Department of MechanicalEngineering, Division PMA, 3001 Heverlee, [email protected]

[email protected]

ABSTRACT

Despite critical components, very little attention has beenpaid for wet friction clutches in the monitoring and prog-nostics research field. This paper presents and discusses anoverall methodology for assessing the health (performance)and predicting the remaining useful life (RUL) of wet frictionclutches. Three principle features extracted from relative ve-locity signal measured between the input and output shaft ofthe clutch, namely (i) the normalized engagement duration,(ii) the normalized Euclidean distance and (iii) the SpectralAngle Mapper (SAM) distance are fused with a logistic re-gression technique into a single value called the health index.In logistic regression analysis, the output of the logisticmodel(i.e. the health index) is restricted between0 and1. Accord-ingly, the logistic model can guide the users to assess the stateof a wet friction clutch either in healthy state (e.g. health in-dex value of (close to) 1) or in failed state (e.g. health indexvalue of (close to) 0). In terms of prognostics, the logarithmof the odds-of-successg defined asg = log[h/(1−h)], whereh denotes the health index, is used as the predicted variable.Once a history data is sufficient for prediction, the weightedmean slope (WMS) method is implemented in this study toadaptively build a prognostics model and to predict the tra-jectory of g until it crosses a predetermined threshold. Thisway, the remaining useful life (RUL) of a clutch can be de-termined. Furthermore, an experimental verification of theproposed methodology has been performed on two historydatasets obtained by performing accelerated life tests (ALTs)on two clutch packs with different friction materials but thesame lubricant. The experimental results confirm that the pro-posed methodology is promising and has a potential to be im-plemented for real-life applications. As was expected, thees-timated RUL converges to the actual RUL and the uncertainty

Agusmian Partogi Ompusunggu et.al. This is an open-access article dis-tributed under the terms of the Creative Commons Attribution 3.0 UnitedStates License, which permits unrestricted use, distribution, and reproduc-tion in any medium, provided the original author and source arecredited.

interval decreases over time that may indicate that the prog-nostics model improves as more evidence becomes available.

1. INTRODUCTION

Wet friction clutches are mechanical components enablingthe power transmission from an input shaft (connected toengine) to an output shaft (connected to wheels), based onthe friction occurring in lubricated contacting surfaces.Theclutch is lubricated by an automatic transmission fluid (ATF)having a function as a cooling lubricant that cleans the con-tacting surfaces and gives smoother performance and longerlife. However, the presence of the ATF in the clutch reducesthe coefficient of friction (COF). In applications where highpower is necessary, the clutch is therefore designed with mul-tiple friction and separator discs. This configuration is knownas a multi-disc wet friction clutch as can be seen in Figure 1,in which the friction discs are typically mounted to the hubby splines, and the separator discs are mounted to the drumby lugs.

Friction discs

Separator discs

Drum

Hub

Figure 1. Exploded view of a wet friction clutch.

Today’s vehicles have become widely equipped with auto-matic transmission (AT) systems, where wet friction clutches

1


114


are one of critical components that play a major role on thetransmission performance. In the beginning of its life, aclutch is designed to transmit certain power under a smoothand fast engagement with minimal shudder. But, due to theunavoidable degradation, the clutch frictional characteristicschange, thus altering its initial performance and consequentlyaffecting the performance of the vehicles. As the degrada-tion proceeds, failure can unexpectedly occur, which even-tually leads to the total breakdown of the vehicles. An un-expected breakdown can put human safety at risk, possiblycause long term vehicle down times, and result in high main-tenance costs. Hence, integration of a maintenance strategyinto AT systems can significantly increase safety and avail-ability/reliability as well as reduce the maintenance costofthe vehicles.

The maintenance strategy should be performed in an opti-mal way, in the sense that degrading clutches need to be re-placed with the new ones at theright time. Here, the righttime can be referred to as the “optimal” end of life of theclutch, at which the clutch is no longer functioning as itshould be. Notice that the end of the clutch lifetime doesnot necessarily mean the condition where the catastrophicfailure occurs. Regarding the optimal maintenance strategy,the information concerning the end of clutch lifetime (or re-maining useful clutch life) therefore becomes important as-pect in order to minimize the vehicles downtime. ConditionBased Maintenance (CBM), which is also known as Predic-tive Maintenance (PdM), is a right-on-time maintenance strat-egy which is driven by the actual condition of the criticalcomponent of systems. This concept requires technologiesand experts, in which all relevant information, such as per-formance data, maintenance histories, operator logs and de-sign data are combined to make optimal maintenance deci-sions (Mobley, 2002). In general, the key technologies for re-alizing the PdM strategy rely on three basic elements, namely(i) condition monitoring, (ii) diagnosticsand (iii) prognostics.PdM has been in use since 1980’s and successfully imple-mented in various applications, such as in oil platforms, man-ufacturing machines, wind turbines, automobiles, electronicsystems, (Basseville et al., 1993; Bansal, Evans, & Jones,2004; Garcia, Sanz-Bobi, & Pico, 2006; Srinivas, Murthy, &Yang, 2007; Bey-Temsamani, Engels, Motten, Vandenplas, &Ompusunggu, 2009b, 2009a).

Despite critical components, to authors’ knowledge, very lit-tle attention has been paid to wet friction clutches in thearea of PdM research. Several methods have been pro-posed in literature for assessing the condition of wet fric-tion clutches based on the quality of the friction mate-rial, namely (i) Scanning Electron Microscope (SEM) mi-crograph, (ii) surface topography, (iii) Pressure DifferentialScanning Calorimetry (PDSC) and (iv) Attenuated Total Re-flectance Infrared spectroscopy (ATR-IR) (Jullien, Meurisse,& Berthier, 1996; Guan, Willermet, Carter, & Melotik., 1998;

Li et al., 2003; Maeda & Murakami, 2003; Nyman, Maki,Olsson, & Ganemi, 2006). Generally, the implementation ofthese existing methods is very time consuming and possiblynot pragmatic for real-life applications, owing to the factthatthe friction discs have to be taken out from the clutch packand then prepared for assessing the degradation level. In otherwords, an online monitoring and prognostics system can notbe realized by using these existing methods

As the central role of wet friction clutches relies on the fric-tion, a natural way to monitor and assess the condition ofthese components is by monitoring and quantifying the fric-tional characteristics. The use of the mean (averaged) coef-ficient of friction (COF) for a given duty cycle as a principlefeature for condition monitoring of wet friction clutches hasbeen popular for many years (Matsuo & Saeki, 1997; Ost,Baets, & Degrieck, 2001; Maeda & Murakami, 2003; Li etal., 2003; Fei, Li, Qi, Fu, & Li, 2008). However, this is nor-mally performed in laboratory tests, namely durability testsof clutch friction materials and ATF, where the used test setup(i.e. SAE#2 test setup) is fully instrumented. For real-life ap-plications, the use of the mean COF for clutch monitoring ispossibly expensive and not easily implementable, due to thefact thatat least two sensorsare required to extract it, namelya torque and a force sensor, which are commonly difficult toinstall (typically not available) in today’s transmissions.

Regarding clutch health assessment and prognostics, only afew publications were found in literature. Yanget al. (Yang,Twaddell, Chen, & Lam, 1997; Yang & Lam, 1998) devel-oped a physics-based prognostics model by considering thatthe degradation occurring in a clutch is only due to thermal ef-fect taking place in the friction materials. The model was de-veloped based on the cellulose fibers concentration, where thechange of the fibers concentration is assumed to be likenedto a simple chemical reaction. They found that, the ratiobetween the instantaneous concentration of cellulose fibersW and the initial concentration of cellulose fibersW0, i.e.weight loss ratioW

W0, likely follows a zero-th order reaction

in isothermal condition. The degradation rate constants areobtained by performing dedicated (separate) Thermal Gravi-metric Analysis (TGA) experiments on the friction materialsamples taken out from clutch packs at different interfacetemperatures. To predict the degradation level and RUL offriction material under dynamic engagement, the temperaturehistory of friction material as a function of time and axial lo-cations are of importance in this approach. Since the interfacetemperature of the friction material during a clutch engage-ment is difficult to measure, they (Yang, Lam, Chen, & Yabe,1995; Yang & Lam, 1998) developed a comprehensive anddetailed mathematical model to predict the temperature at theinterface, as well as the temperature distribution as a func-tion of time and different locations. Hence, the accuracy ofthe existing clutch prognostics method is strongly determinedby the accuracy of temperature prediction. Since the degra-

2


115


dation mechanism occurring in the clutch friction materialisnot only due to thermal effect but also another major mech-anism namely adhesive wear (Gao, Barber, & Chu, 2002;Gao & Barber, 2002), the assumption made in this prognos-tics method is too oversimplified. When the complete designdata of a wet friction clutch is not available, this prognosticsmethod would be possibly difficult to implement.

Considering the above discussed literature survey, one mayconclude that the available clutch monitoring and prognosticsmethods are not pragmatic and flexible to implement for real-life applications. Nowadays, there is still a need for automo-tive industries (e.g.our industrial partner) to realize a clutchmonitoring and prognostics system which is easy to imple-ment and flexible to adapt. In addition to this, the develop-ment of such a system must be based on typically availablesensors in AT systems, such as rotational velocity, pressureand temperature sensors. Hence, research in this directionisstill of great interest.

Recently, some potential monitoring methods which canserve as bases for clutch prognostics have been investigatedand reported in previous publications. The preliminary eval-uations of a clutch monitoring method based on thepost-lockup torsional vibration are discussed in (Ompusunggu,Papy, Vandenplas, Sas, & VanBrussel, 2009; Ompusunggu,Sas, VanBrussel, Al-Bender, Papy, & Vandenplas, 2010).Since it is reasonable to assume that the ATF has no signifi-cant effect on the clutch post-lockup torsional vibration,thismethod is suitable to monitor only the clutch friction mate-rial degradation. A more complete description concerningthe feasibility and practical implementation of this methodwill be discussed in another communication. Furthermore, aclutch monitoring method based on thepre-lockuptorsionalvibration is evaluated in (Ompusunggu, Sas, VanBrussel, Al-Bender, & Vandenplas, 2010), where a high resolution ro-tational velocity sensor is required in order to capture thehigh frequency torsional vibration. Another clutch monitor-ing method based on tracking the change of the relative rota-tional velocity between the input and output shaft of a clutchis proposed in (Ompusunggu, Papy, Vandenplas, Sas, & Van-Brussel, 2012). Since the relative velocity can be seen as therepresentative of the clutch dynamic behavior during the en-gagement phase, which are strongly influenced by the com-bination of friction material and ATF, the latter method canbe used to monitor the global state of a clutch. Nevertheless,the prognostics aspect was not tackled yet in those publica-tions. An attempt to develop a systematic methodology forhealth assessment and prognostics of wet friction clutchesisthe main objective of this paper. For this purpose, the lat-ter condition monitoring method described in (Ompusungguet al., 2012) is extended in this paper towards a prognosticsmethodology.

The remainder of this paper is organized as follows. After in-

troducing the objective and motivation, the methodology pro-posed in this paper is presented and discussed in Section 2.To verify the proposed method, life data of some commer-cially available clutches obtained from accelerated life tests(ALTs) carried out on a fully instrumented SAE#2 test setupare employed. The details of the experimental aspects aredescribed in Section 3. The results obtained after applyingthe proposed method to the clutches’ life data are further pre-sented and discussed in Section 4. Finally, some conclusionsdrawn from the study that can be a basis for future work arepresented in Section 5.

2. METHODOLOGY

The overall methodology proposed in this paper is describedin the flowchart depicted in Figure 2. As can be seen in thefigure, the methodology consists of four steps. In the firststep, capturing the signal of interest from raw pressure andrelative velocity signals is discussed. In the second step,threeprimary features are computed once the signal of interest hasbeen captured, where the verification of the three features hasbeen addressed in another publication (Ompusunggu et al.,2012). In the third step, the features are fused into a singlevalue, namely health index, using a logistic regression tech-nique. The output of the logistic model is restricted between0 and 1 such that the health or performance of a wet frictionclutch can be easily assessed. Finally, the algorithm to pre-dict the remaining useful life (RUL) using the fused featuresas the predicted variable is presented and discussed. Sincethe knowledge of the evolution of the proposed features dur-ing clutches’ lifetime is still limited, a data-driven prognosticsapproach is investigated in this paper.

Preprocessing Input Signals: Captur-ing relative velocity signal of interest.

Feature Extraction: Computing prin-ciple features, namely normalized engage-ment duration τe, normalized Euclideandistance DE and SAM distance DSAM .

Performance Assessment: Fusing fea-tures (τe, DE and DSAM ) into a HealthIndex (HI) constrained between 0 and 1.

Prognostics: Predicting theremaining useful life (RUL).

Figure 2. Flow chart of the proposed methodology.

3


116


2.1. Relevant Signals Measurement

Prior to computing principle features as will be discussed inthe next subsection, the raw signals obtained from measure-ments first need to be preprocessed. Figure 3 graphically il-lustrates the signal preprocessing step, namely the procedureto capture therelative velocity signal of interestbased ontwo raw measurement signals: (i) therelative rotational ve-locity and (ii)applied pressuresignals. In the following para-graphs, the procedure is briefly discussed.

Let the signal of interest be captured at a given (arbitrary)duty cycle with a predetermined time record lengthτ , andsuppose that the time record length is kept the same for allduty cycles. For the sake of consistency, the signal must becaptured at thesamereference time instant. It is reasonable toconsider that the time instant when the ATF pressure appliedto the clutch packp(t) starts to increase from zero valuetf asthe reference time, which can be mathematically formulatedas:

tf = min ∀t ∈ R : p(t) > 0 . (1)

While the applied pressure is increasing, contact is graduallyestablished between the separator and friction discs. As aresult, the transmitted torque increases that consequently re-duces the relative velocitynrel(t). Eventually, the clutch isfully engaged when the relative velocity reaches zero for thefirst time at the lockup time instanttl that can be formulatedin similar way to Equation (1) as:

tl = min ∀t ∈ R : nrel(t) = 0 . (2)

t [a.u.]

p[a

.u.]

nrel

[a.u

.]

pmax

nmax

0

0tf tl

τ

τe

Figure 3. A graphical illustration of how to capture the rela-tive velocity signal of interest. The upper and lower figuresrespectively denote the typical applied pressurep and the rawrelative velocity signalnrel. Note that a.u. is the abbreviationof arbitrary unit.

2.2. Feature Extraction

Formal definitions of the developed features (engagement du-ration, Euclidean distance and Spectral Angle Mapper dis-tance) and the mathematical expressions to compute them arediscussed in this subsection. The first two features are dimen-sional quantities while the third one is dimensionless. Thefirst two features are normalized such that they become di-mensionless quantities and are in the same order of magni-tude as of the third feature.

2.2.1. Engagement Duration

The engagement durationτe is referred to as the time inter-val between the lockup time instanttl and the reference timeinstanttf , as graphically illustrated in Figure 3. Once bothtime instantstf andtl have been determined, the engagementdurationτe can then be simply computed as follows:

τe = tl − tf . (3)

Without loss of generality,τe can be normalized with respectto the engagement duration measured at the initial condition(healthy state)τ re , according to the following equation:

τe =τe − τ re

τ re, (4)

whereτe denotes the dimensionless engagement duration.

2.2.2. Dissimilarity Measures

A dissimilarity measure is a metric that quantifies the dissim-ilarity between objects. For the sake of condition monitoring,the dissimilarity measure between an object that representsan arbitrary condition and the reference object that representsa healthy condition, can be treated as a feature. Thus, thedissimilarity measure between two identical objects is (closeto) zero; the dissimilarity measure between two non-identicalobjects on the other hand is not zero. Here, the object will bereferred to as the relative velocity signalnrel. Two dissimilar-ity measures, namely the Euclidean distance and the SpectralAngle Mapper (SAM) distance, are considered in this paperbecause of their computational simplicity (Kruse et al., 1993;Paclik & Duin, 2003).

The main motivation behind the dissimilarity approach is thatthe measured signals of interest are treated as vectors. LetX be aK dimensional vector,xi, i = 1, 2, . . . ,K, denotingthe discrete signal of the relative velocity measured in a nor-mal (healthy) condition andY be aK dimensional vector,yi, i = 1, 2, . . . ,K, denoting the discrete signal of the rela-tive velocity measured in an arbitrary condition. The vectorX representing a healthy condition is referred to as the “base-line”.

The dimensional Euclidean distanceDE between the vectors

4


117


X andY is defined as:

DE (X,Y) =

√√√√K∑

i=1

(xi − yi)2. (5)

For convenience,DE can also be normalized in accordancewith the following equation:

DE (X,Y) =DE (X,Y)

x1

√K

, (6)

whereDE denotes the dimensionless Euclidean distance andx1 > 0 denotes the initial value of thebaseline.

By definition, the SAM distance is a measure of the angle be-tween two vectors and is therefore dimensionless. The SAMdistanceDSAM between the vectorsX andY is mathemati-cally expressed as:

DSAM (X,Y) = cos−1

∑Ki=1 xiyi√∑K

i=1 x2i

√∑Ki=1 y

2i

. (7)

Recall that the distance from an object to itself is zero andthat a distance is always non-negative.

2.3. Health Assessment

Health assessment constitutes a dichotomous problem,namely determining the state of of a unit (system) of interest(UOI) whether in healthy or failure state. Intuitively, healthcan be represented by a binary value,e.g. 0 or 1, where thiscategorical value may be seen as a health index. For healthassessment purposes, it is natural to assume that the healthin-dex of (close to) 1 represents a healthy state, while the healthindex of (close to) 0 represents a failure state. This formula-tion implies that the degradation occurring in a UOI is indi-cated by the progressive change of the health index from 1 to0. It should be noted that the health index is sometimes called“confidence value” in literature.

In practice, feature values are not necessarily restrictedbe-tween 0 and 1, which cannot allow a direct justification onthe health of a UOI. Despite reflecting the actual conditionof a UOI, principle features extracted from measurement datacannot be directly used to assess the health of the UOI unlesstherelative distancesto the corresponding values which rep-resent the end of life of the UOI (i.e. thresholds) are known.To this end, the feature values evolving from a healthy to fail-ure state need to be transformed to the health indices.

In this study, health assessment based on a logistic regres-sion technique is investigated. As will be shown later, logis-tic regression can be seen as a process with a two-fold objec-tive: (i) fusing multiple features (independent variables) intoa single value (i.e. health index) and (ii) restricting the healthindex between 0 and 1. As discussed in (Lemeshow & Hos-mer, 2000), logistic regression is appropriate technique for di-chotomous problems, where the predicted variable (i.e. health

index) must be greater than or equal to zero and less than orequal to one. Unlike linear regression which is inappropri-ate for dichotomous problems (Lemeshow & Hosmer, 2000),in logistic regression, only data representing healthy andfail-ure states are required to estimate the regression coefficients.Thus, a logistic regression technique is suitable to problemswith limited number of history data. Moreover, it has beenreported in the literature that logistic regression technique isa powerful tool for health assessment modeling of some sys-tems based on extracted high dimensional features (Yan, Koc,& Lee, 2004; Yan & Lee, 2005).

Let us consider a simplelogistic function P (F ) defined as:

P (F ) = h =1

1 + e−g(F)=

eg(F )

1 + eg(F ), (8)

whereF = F1, F2, . . . , FL denotes a set ofL extractedfeatures,h denotes the health index of an event (i.e. healthyor failure) given a set of featuresF andg(F ) is thelogit func-tion which is mathematically expressed as:

g(F ) = g = log

(P (F )

1 − P (F )

)=

L∑

i=0

βiFi, (9)

whereF0 = 1, βi denotes the logistic model parameters tobe identified andg denotes the logarithm of the “odds-of-success”. In a more compact way, Equation (9) can be rewrit-ten as:

g = βTF, (10)

withβ = [β0 β1 β2 . . . βL]T

andF = [1 F1 F2 . . . FL]T,

where the superscript T denotes a transpose operation.

Note that the logistic function expressed in Equation (8) canbe seen as a kind of probability function (cumulative distri-bution function) because it ranges between 1 (healthy) and 0(failure). In addition to this, the logit function expressed inEquation (10) constitutes a linear combination of featuresex-tracted from measurement dataF1, F2, . . . , FL. This impliesthat the logarithm of the odds-of-successg preserves the na-ture of features to be extracted from measurement signals.

Here, the main objective of the logistic regression is to iden-tify (L + 1) parametersβ in Equation (10) such that the lo-gistic model is readily implementable for the health assess-ment of a UOI. In this context, the parameter identification isnormally performed using the maximum-likelihood estima-tor, which entails finding the set of parameters for which theprobability of the observed data is maximal (Czepiel, n.d.).This is done off-line where two sets of features,Fhealth andFfailure respectively representing healthy and failure states,are used as training data.

5


118


2.4. Prognostics Algorithm - Data Driven Approach

The prognostics algorithm proposed in this paper is based ona data-driven approach. The variable to be predicted is thelogarithm of the odds-of-successg. The main reason for thisconsideration is that this predicted variableg still preservesthe nature of features extracted from the measurements (thelinear combination of features). Basically, the algorithmcon-sists of four main steps, namely (i) determining the first timeinstant to start predictiont1p such that the history data avail-able att1p are sufficient, (ii) building a prognostics modelfrom the available data, (iii) predicting the trajectory ofthepredicted variableg to the future based on the built prog-nostics model and (iv) estimating the remaining useful life(RUL). When new data are available, the steps (ii) - (iv) areperformed and this procedure is periodically repeated until acertain time instant useful to do prediction. Thus, the prog-nostics model is updated once new data are provided and it isexpected that the model converges because more evidence ac-cumulates over time. These steps are discussed in more detailin the subsequent paragraphs.

In this paper,t1p is proposed as the time instant when thehealth indexh is equal to0.75. At this value (h = 0.75),theoretically, it is reasonable to assume that a UOI has passedabout25% of its total lifetime and the history data to builda prognostics model are practically available. In the domainof the predicted variableg, the aforementioned health indexh = 0.75 corresponds tog = 1.098.

The weighted mean slope (WMS) method proposed in (Bey-Temsamani et al., 2009b) is used in this paper to adaptivelybuild a prognostics model for given history data. This methodis easy to implement and based on a data-driven approachwhere the model to be built is updated periodically when newdata come in. In this method, all the local slopes of a timeseries are first computed. Afterwards, the slope at the end ofthe time series (i.e. at the arbitrary time instant to do pre-diction tp) is computed by summing up all the local slopesweighted by a certain function, where the weighting factor ofthe most recent data is the greatest. Letg = g1, g2, . . . , gNand t = t1, t2, . . . , tN be respectively the history of thelogarithm of the odds-of-success and the corresponding timesequence attp. The WMSbw at this particular time instanttpis calculated according to the following equation:

bw =N∑

n=2

ωnbn, (11)

withωn =

n∑N

n=2 n, (12)

andbn =

gn − gn−1

tn − tn−1n = 2, 3, . . . , N , (13)

wherebn andωn respectively denote the local slope and thecorresponding weighting factor. The standard deviationσb

of the WMS at time instanttp is calculated according to thefollowing equation:

σb =

√√√√N∑

n=2

ωn (bn − bw)2. (14)

For 95% confidence interval, the lower boundblowerw and

upper boundbupperw of the WMS can be calculated as fol-lows (Meeker & Escobar, 1998):

blowerw = bw − 1.96

σb√N − 1

, (15)

bupperw = bw + 1.96σb√N − 1

, (16)

As will be shown later in Section 4, the three features(DE , DSAM , τe) evolve linearly during the lifetime of thetested clutches. It is therefore reasonable to assume that thetrend of the predicted variableg is also linear since the natureof the features is preserved. Hence, the value of the predictedvariable at time instanttp + th, namelygtp+th , is given by:

gtp+th = gtp + bwth, (17)

wheregtp = gN .

Suppose that the failure threshold (RUL threshold)glimit isknown in advance. The expected RULr at an arbitrary timeinstanttp can be computed as:

r =glimit − gtp

bw. (18)

Based on the lower and upper bound of the WMS expressedin Equations (15) and (16), the uncertainty of prognostics (thelower bound∆rlower and the upper bound∆rupper of theRUL) can be estimated according to the following equations:

∆rlower = r − rlower, (19)

∆rupper = rupper − r, (20)

with

rlower =glimit − gtp

bupperw, (21)

rupper =glimit − gtp

blowerw

. (22)

3. EXPERIMENT

Service life data of wet friction clutches are required for theevaluation of the developed health assessment and prognos-tics method. In order to obtain the clutch service life datain a reasonable period of time, the concept of an acceleratedlife test (ALT) is applied in this study. For this purpose, afully instrumented SAE#2 test setup designed and built byour industrial partner, Dana Spicer Off Highway Belgium,was made available. In this respect, an ALT can be real-ized by means of applying a higher mechanical energy to a

6


119


tested clutch compared to the amount of energy transmittedby a clutch in normal operation. The energy level is normallyadjusted by changing the initial relative velocity and/or theinertia of input and output flywheels. In this study, the ALTswere conducted on different commercial wet friction clutchesusing the fully instrumented SAE#2 test setup. During thetests, all the clutches were lubricated with the same Auto-matic Transmission Fluid (ATF). The test setup and the ALTprocedure are discussed in the following subsections.

3.1. SAE#2 test setup

The SAE#2 test setup used in the experiments, as depicted inFigure 4, basically consists of three main systems, namely:driveline, control and measurement system. The drivelinecomprises several components: an AC motor for driving theinput shaft (1), an input velocity sensor (2), an input flywheel(3), a clutch pack (4), a torque sensor (5), output flywheel(6), an output velocity sensor (7), an AC motor for drivingthe output shaft (8), a hydraulic system (11-20) and a heat ex-changer (21) for cooling the outlet ATF. An integrated controland measurement system (22) is used for controlling the ATFpressure (both for lubrication and actuation) to the clutchandfor the initial velocity of both input and output flywheels aswell as for measuring all relevant dynamic signals.

(a)

M M

V/A

D/AA/D

1

2

3

45

6

7

8

9 10

11

12

13

14

15

16

17

18

19

20

21

22

PC

(b)

Figure 4. The SAE#2 test setup used in the study, (a) photo-graph and (b) scheme, courtesy of Dana Spicer Off HighwayBelgium.

3.2. Test specification

The general specification of the test scenario is given in Ta-ble 1. Two clutch packs with different lining materials of thefriction discs were tested. It should be noted that all the usedfriction discs, separator discs and ATF are commercial oneswhich can be found in the market. In all the tests, the inlettemperature and flow of the ATF were kept constant, see Ta-ble 1. Additionally, one can see in the table that the inertiaofthe input flywheel (drum-side) is lower than that of the outputflywheel (hub-side).

Number of clutch packs to be tested 2Number of friction discs in the clutch assembly 8Inner diameter of friction disc (di) [mm] 115Outer diameter of friction disc (do) [mm] 160ATF John Deere J20CLubrication flow [liter/minute] 18Inlet temperature of ATF [C] 85Output flywheel inertia [kgm2] 3.99Input flywheel inertia [kgm2] 3.38Sampling frequency [kHz] 1

Table 1. General test specification.

3.3. Test procedure

Before an ALT is carried out to a wet friction clutch, a run-intest (lower energy level) is first conducted for 100 duty cyclesin order to stabilize the contact surface. The run-in test pro-cedure is in principle the same as the ALT procedure, but theinitial relative rotational velocity of the run-in tests islowerthan that of the ALTs. Figure 5 illustrates a duty cycle of theALT that is carried out as follows. Initially, while both in-put flywheel (drum-side) and output flywheel (hub-side) arerotating at predefined respective speeds in opposite direction,the two motors are powered-off and the pressurized ATF issimultaneously applied to a clutch pack at time instanttf .The oil thus actuates the clutch piston, pushing the frictionand separator discs towards each other. This occurs duringthe filling phase between the time instanttf and ta. Whilethe applied pressure is increasing, contact is gradually estab-lished between the separator and friction discs which resultsin an increase of the transmitted torque on the one hand anda decrease of the relative velocity on the other hand. Finally,the clutch is completely engaged when the relative velocityreaches zero at the lockup time instanttl. As the inertia andthe respective initial speed of the output flywheel (hub-side)are higher than those of the input flywheel, aftertl, both fly-wheels rotate together in the same direction as the output fly-wheel, see Figure 5. In order to prepare for the forthcomingduty cycle, both driving motors are braked at the time instanttb, such that the driveline can stand still for a while.

The ALT procedure discussed above is continuously repeateduntil a given total number of duty cycles is attained. For thesake of time efficiency in measurement, all the ALTs are per-

7


120


tltatf tbt [s]

Sca

led

units

(see

labe

ls)

−40

−20

0

20

40 ATF temperature/5 [C]Pressure [bar]

Drum velocity/100 [rpm]

Hub velocity/100 [rpm]

Torque/100 [Nm]

Figure 5. A representative duty cycle of wet friction clutches.Note that the transmitted torque drops to zero after the lockuptime instanttl because there is no external load applied duringthe test.

formed for 10000 duty cycles. The pressure applied to theclutches is kept constant during the tests and the ATF is con-tinuously filtered, such that it is reasonable to assume thattheused ATF has not degraded during the tests.

4. RESULTS AND DISCUSSION

Figure 6 shows the optical images and the surface profile ofthe friction material before and after the ALT, taken fromthe first clutch pack. The images are captured using aZeissmicroscopeand the surface profiles are measured along thesliding direction using aTaylor Hobson Talysurf profilome-ter. It can be seen in the figure that the surface of the fric-tion material has become smooth and glossy and the clutch istherefore considered to have failed. The change of the colorand the surface topography of the friction material is knownas a result of the glazing phenomenon that is believed to becaused by a combination of adhesive wear and thermal degra-dation (Gao et al., 2002; Gao & Barber, 2002).

4.1. Capturing the Signal of Interest

Figure 7 shows 3D plots of the relative velocity signals of in-terest obtained from the first ALT (clutch pack#1) and secondALT (clutch pack#2). All the signals are captured at the samereference time instanttf with the same time record lengthτ of 2.5 s. As can be seen in the figure, the reference timeinstanttf is set to zero. Furthermore, it is evident from thefigure that the profile of the relative velocity signal deviatesfrom its initial profile, as the clutch degradation progresses(pointed out by the arrow). This deviation is indicated bytwo major patterns, namely (i) the changing shape and (ii)the shifting lockup time instanttl to the right hand-side withrespect to the reference time instanttf . This observation con-firms the experimental results reported in the literature (Fei etal., 2008).

−10

−20

−30

0

10

00 1 2 3 4 0.1x [mm]

z[µ

m]

φ(z)

(a)

−10

−20

−30

0

10

00 1 2 3 4 0.5x [mm]

z[µ

m]

φ(z)

(b)

Figure 6. Comparison of the friction material before and afterthe ALT of 10000 duty cycles. (a) optical image (left) and thecorresponding surface profile (right) of the friction materialbeforethe test, (b) optical image (left) and the correspond-ing surface profile (right) of the friction materialafter thetest. Note thatz denotes the displacement of the profilometerstylus in Z-axis (perpendicular to the surface),x denotes thedisplacement of the profilometer stylus in X-axis (along thesliding direction) andφ(z) denotes the probability distribu-tion function of the surface profile.

t [s]Ncycle [-]

nrel

[rpm

]

0

01

2

0

2500

5000

500010000

(a)

t [s]Ncycle [-]

nrel

[rpm

]

0

01

2

0

2500

5000

500010000

(b)

Figure 7. Evolution of the relative velocity signals of interestobtained from (a) thefirst ALT and (b) thesecondALT.

8


121


4.2. Extracted Features

The features introduced in Section 2 are extracted from therelative velocity signal of interest as shown in Figure 7 basedon Equations (4), (6) and (7). Figure 8 shows the evolutionof the features in function of the clutches service life. It isremarkable to mention that the trends of all the features arelinearly increasing with relatively small variations.

τ e[-

]D

E[-

]D

SAM

[-]

Ncycle [-]

0

0

0

0.2

0.2

0.2

0.4

0.4

0.4

0 2500 5000 7500 10000

ALT1

ALT2

Figure 8. Evolution of the features obtained from the first andsecond ALT.

4.3. Logistic Model

In order to build a logistic model for clutch health assessment,a number of sets of the features,Fhealth =

τ ie, D

iE , D

iSAM

andFfailure =τ fe, D

fE , D

fSAM

, respectively representing

healthy and failure states are required. Note that the super-scripts i and f respectively denote the healthy and failure state.Table 2 lists the sets of features used for logistic regressionfor different observations on the features extracted from themeasurement data as shown in Figure 8. The health indexh assigned for the healthy and failure states are respectively0.95 and 0.05. It should be mentioned here that these two val-ues are heuristically derived since there are no enough historydata.

Using the training data listed in Table 2, the parameters ob-tained from the logistic regression can be written as:

β = [3.09 2.07 − 35.96 3.57]T.

Based on the identified parameters, the logistic model rep-resented as the health indexh in function of the clutch dutycyclesNcycle can be expressed in the following equation:

h(Ncycle) =eg(Ncycle)

1 + eg(Ncycle), (23)

with

g(Ncycle) = 3.09 + 2.07τe − 35.96DE + 3.57DSAM . (24)

Figure 9 shows the evolution of the health indexh of the twotested clutches. As was expected, the health index decreasesprogressively during the service life of the clutches. Since theindex is restricted between 0 and 1, this figure can thus easethe users to justify the health status of the clutches. Whenthe index value is close to 1, one can directly justify that theclutches are healthy, while the index approaches 0, one caneasily justify that the clutches are going to fail.

h[-

]

Ncycle [-]0 2500 5000 7500 10000

0

0.25

0.5

0.75

1

ALT1ALT2

Figure 9. The health indexh evolution of both tested clutches.

4.4. Prognostics Performance

In this subsection, the performance of the proposed prognos-tics algorithm is demonstrated. Figure 10 shows the evolu-tion of the logarithm of the odds-of-successg which has beenspecified as the predicted variable. Wheng = 1.098 (i.e.crossing the upper horizontal line) the algorithm is triggeredfor the first time to build a prognostics model (at 3000th cy-cle) and the trajectory ofg indicated by the gray dashed line isconsecutively predicted until it crosses the predefined thresh-old (the lower horizontal line). The RUL thresholdglimit isset at the value of−2.197 which corresponds to the health

Observation τe DE DSAM

1 0 0 0Healthy 2 0.0033 0.0034 0.0051

state (h = 0.95) 3 0.017 0.0122 0.0174 0.0109 0.0069 0.0109

1 0.3 0.2 0.27Failure 2 0.32 0.22 0.29

state (h = 0.05) 3 0.3205 0.2186 0.29954 0.3593 0.2216 0.3062

Table 2. Sets of features used for logistic regression analysis

9


122


index of 0.1. At this particular value (glimit = −2.197), itis reasonable to assume that the tested clutches have passedabout 90% of their expected total lifetime. For a compari-son, the prediction at 7000th cycle is also shown in the fig-ure. As can be seen, the predicted trajectory ofg at the lattercycle (shown by the solid black line) gets closer to the mea-surement data indicating that the model has been updated.

0

@3000th @7000th

RUL threshold

Ncycle [-]

g[-

]

Measurement

2500 5000 7500 10000

-4

0

4

Figure 10. Representative evolution of the logarithm of theodds-of-successg.

ActualEstimated

r[-

]

Ncycle [-]3000 6000 9000

0

4000

8000

(a)

ActualEstimated

r[-

]

Ncycle [-]3000 6000 9000

0

4000

8000

(b)

Figure 11. Comparison of the estimated and actual RULs of:(a) the first clutch pack and (b) the second clutch pack.

The RUL estimations of both clutches are depicted in Fig-ure 11. As can be seen in the figure, the error between theestimated RULs and the actual RULs, and the correspondinguncertainty interval are quite large in the beginning of thepre-diction because limited amount of data are available to build

the prognostics model. When more evidence becomes avail-able, it is evident from the figure that the estimated RULs tendto converge to the actual RULs and the uncertainties tend todecrease over time, implying that the prognostics model im-proves over time.

5. CONCLUSION AND FUTURE WORK

In this paper, an attempt to develop a health (performance)assessment and prognostics methodology for wet frictionclutches has been presented and discussed. For health as-sessment purposes, all the extracted features are fused into asingle variable called the health indexh which is restrictedbetween 0 and 1, based on a logistic regression solved withthe maximum likelihood estimation technique. In this way,a logistic model can be built that allows a direct justificationon the health of wet friction clutches. In terms of prognos-tics, the logarithm of odds-of-success,i.e. log(h/(1 − h))is assigned as the predicted variable. The weighted meanslope (WMS) method, which is simple and easy to imple-ment, is used to predict the trajectory of the predicted vari-able and consecutively to predict the remaining useful life(RUL) of clutches. The proposed methodology has been ex-perimentally evaluated on two commercially available clutchpacks with different friction materials. The experimentalre-sults confirm that the methodology proposed in this paper ispromising in aiding the development a maintenance strategyfor wet friction clutches.

The experiments carried out in this study were under con-trolled environment. More data will be collected in the fu-ture where some variations of loading, operational tempera-ture and applied pressure during the duty cycles are present.Furthermore, some prospective algorithms need to be evalu-ated in future work in order to determine the most optimalone, in regard to the accuracy, convergence rate and the prac-tical implementation.

ACKNOWLEDGMENT

All the authors are grateful for the experimental support byDr. Mark Versteyhe of Dana Spicer Off Highway Belgium.

10


123


NOMENCLATURE

t timeW instantaneous concentration of cellulose fibersW0 initial concentration of cellulose fibersnrel relative velocityp pressuretf reference time instanttl lockup time instantt1p time instant for first predictiontp arbitrary time instant for predictionτ time record lengthX vector denoting a discrete relative velocity signal

measured in an initial (healthy) conditionY vector denoting a discrete relative velocity signal

measured in an arbitrary conditionτe normalized engagement durationDE normalized Euclidean distanceDSAM normalized SAM distanceF a set of featuresh health indexg logarithm of the odds-of-successglimit RUL thresholdbn local slopeωn weighting factorbw weighted mean slopeσb weighted standard deviationr remaining useful life (RUL)Ncycle number of duty (engagement) cycles

REFERENCES

Bansal, D., Evans, D. J., & Jones, B. (2004). A real-timepredictive maintenance system for machine systems.International Journal of Machine Tools and Manufac-ture, 44(7-8), 759 - 766.

Basseville, M., Benveniste, A., Gach-Devauchelle, B., Gour-sat, M., Bonnecase, D., Dorey, P., et al. (1993). In situdamage monitoring in vibration mechanics: diagnos-tics and predictive maintenance.Mechanical Systemsand Signal Processing, 7(5), 401 - 423.

Bey-Temsamani, A., Engels, M., Motten, A., Vandenplas,S., & Ompusunggu, A. P. (2009a). Condition-BasedMaintenance for OEM’s by application of data miningand prediction techniques. InProceedings of the 4thWorld Congress on Engineering Asset Management.

Bey-Temsamani, A., Engels, M., Motten, A., Vandenplas, S.,& Ompusunggu, A. P. (2009b). A Practical Approachto Combine Data Mining and Prognostics for ImprovedPredictive Maintenance. InThe 15th ACM SIGKDDConference on Knowledge Discovery and Data Min-ing.

Czepiel, S. (n.d.). Maximum likelihood esti-mation of logistic regression models: the-ory and implementation. Available from

http://czep.net/stat/mlelr.pdfFei, J., Li, H.-J., Qi, L.-H., Fu, Y.-W., & Li, X.-T. (2008).

Carbon-Fiber Reinforced Paper-Based Friction Mate-rial: Study on Friction Stability as a Function of Oper-ating Variables.Journal of Tribology, 130(4), 041605.

Gao, H., & Barber, G. C. (2002). Microcontact Model forPaper-Based Wet Friction Materials.Journal of Tribol-ogy, 124(2), 414 - 419.

Gao, H., Barber, G. C., & Chu, H. (2002). Friction Character-istics of a Paper-based Friction Material.InternationalJournal of Automotive Technology, 3(4), 171 - 176.

Garcia, M. C., Sanz-Bobi, M. A., & Pico, J. del. (2006).SIMAP: Intelligent System for Predictive Mainte-nance: Application to the health condition monitor-ing of a wind turbine gearbox.Computers in Industry,57(6), 552 - 568.

Guan, J. J., Willermet, P. A., Carter, R. O., & Melotik., D. J.(1998). Interaction Between ATFs and Friction Ma-terial for Modulated Torque Converter Clutches.SAETechnical Paper, 981098, 245 - 252.

Jullien, A., Meurisse, M., & Berthier, Y. (1996). Determina-tion of tribological history and wear through visualisa-tion in lubricated contacts using a carbon-based com-posite.Wear, 194(1 - 2), 116 - 125.

Kruse, F., Lefkoff, A., Boardman, J., Heidebrecht, K.,Shapiro, A., Barloon, P., et al. (1993). The spectralimage processing system (SIPS) - interactive visualiza-tion and analysis of imaging spectrometer data.RemoteSensing of Environment, 44(2-3), 145 - 163.

Lemeshow, D., & Hosmer, S. (2000).Applied Logistic Re-gression. New York: Willey. ISBN 0-471-35632-8.

Li, S., Devlin, M., Tersigni, S. H., Jao, T. C., Yatsunami, K.,& Cameron., T. M. (2003). Fundamentals of Anti-Shudder Durability: Part I-Clutch Plate Study.SAETechnical Paper, 2003-01-1983, 51 - 62.

Maeda, M., & Murakami, Y. (2003). Testing method and ef-fect of ATF performance on degradation of wet frictionmaterials. SAE Technical Paper, 2003-01-1982, 45 -50.

Matsuo, K., & Saeki, S. (1997). Study on the change offriction characteristics with use in the wet clutch of au-tomatic transmission.SAE Technical Paper, 972928,93 - 98.

Meeker, W., & Escobar, L. (1998).Statistical Methods forReliability Data. Wiley, New York.

Mobley, R. K. (2002).An introduction to predictive mainte-nance. Butterworth-Heinemann.

Nyman, P., Maki, R., Olsson, R., & Ganemi, B. (2006). Influ-ence of surface topography on friction characteristics inwet clutch applications.Wear, 261(1), 46 - 52. (Paperspresented at the 11th Nordic Symposium on Tribology,NORDTRIB 2004)

Ompusunggu, A., Papy, J.-M., Vandenplas, S., Sas, P., &VanBrussel, H. (2009). Exponential data fitting for

11


124


features extraction in condition monitoring of paper-based wet clutches. In C. Gentile, F. Benedettini,R. Brincker, & N. Moller (Eds.),The Proceedings ofthe 3rd International Operational Modal Analysis Con-ference (IOMAC)(Vol. 1, p. 323-330). Starrylink Ed-itrice Brescia.

Ompusunggu, A., Papy, J.-M., Vandenplas, S., Sas, P., & Van-Brussel, H. (2012). Condition Monitoring Method forAutomatic Transmission Clutches.International Jour-nal of Prognostics and Health Management (IJPHM),3.

Ompusunggu, A., Sas, P., VanBrussel, H., Al-Bender, F.,Papy, J.-M., & Vandenplas, S. (2010). Pre-filteredHankel Total Least Squares method for condition mon-itoring of wet friction clutches. InThe Proceedings ofthe 7th International Conference on Condition Moni-toring and Machinery Failure Prevention Technologies(CM-MFPT). Coxmor Publishing Company.

Ompusunggu, A., Sas, P., VanBrussel, H., Al-Bender, F., &Vandenplas, S. (2010). Statistical feature extractionof pre-lockup torsional vibration signals for conditionmonitoring of wet friction clutches. InProceedings ofISMA2010 Including USD2010.

Ost, W., Baets, P. D., & Degrieck, J. (2001). The tribolog-ical behaviour of paper friction plates for wet clutchapplication investigated on SAE # II and pin-on-disktest rigs.Wear, 249(5-6), 361 - 371.

Paclik, P., & Duin, R. P. W. (2003). Dissimilarity-basedclassification of spectra: computational issues.Real-Time Imaging, 9(4), 237 - 244.

Srinivas, J., Murthy, B. S. N., & Yang, S. H. (2007). Dam-age diagnosis in drive-lines using response-based opti-mization.Proceedings of the Institution of MechanicalEngineers, Part D: Journal of Automobile Engineering,221(11), 1399 - 1404.

Yan, J., Koc, M., & Lee, J. (2004). A prognostic algorithm formachine performance assessment and its application.Production Planning & Control, 15(8), 796-801.

Yan, J., & Lee, J. (2005). Degradation Assessment andFault Modes Classification Using Logistic Regression.Journal of Manufacturing Science and Engineering,127(4), 912-914.

Yang, Y., & Lam, R. C. (1998). Theoretical and experimen-tal studies on the interface phenomena during the en-gagement of automatic transmission clutch.TribologyLetters, 5, 57 - 67.

Yang, Y., Lam, R. C., Chen, Y. F., & Yabe, H. (1995). Model-ing of heat transfer and fluid hydrodynamics for a mul-tidisc wet clutch. SAE Technical Paper, 950898, 1 -15.

Yang, Y., Twaddell, P. S., Chen, Y. F., & Lam, R. C. (1997).Theoretical and experimental studies on the thermaldegradation of wet friction materials.SAE TechnicalPaper, 970978, 175 - 183.

BIOGRAPHIES

Agusmian Partogi Ompusunggu is a project engineer atFlanders’ MECHATRONICS Technology Centre (FMTC),Belgium. His research focuses in condition monitoring,prognostics, vibration testing and analysis and tribology.He earned his bachelor degree in mechanical engineering(B.Eng) in 2004 from Institut Teknologi Bandung (ITB),Indonesia and master degree in mechanical engineering(M.Eng) in 2006 from the same technological institute. He iscurrently pursuing his PhD degree in mechanical engineeringat Katholieke Universiteit Leuven (K.U.Leuven) Belgium.

Steve Vandenplasis a program leader at Flanders’ MECHA-TRONICS Technology Centre (FMTC), Belgium. Hereceived his Master’s Degree of Electrotechnical Engineerin1996 from the Vrije Universiteit Brussel (VUB), Belgium.In 2001, he received a PhD in Applied Science and startedto work as R&D Engineer at Agilent Technologies for oneyear. Thereafter, he decided to work as a PostdoctoralFellow at the K.U. Leuven in the Department of Metallurgyand Materials Engineering, in the research group materialperformance and non-destructive testing (NDT). He has beenworking at Flanders’ MECHATRONICS Technology Centre(FMTC) since 2005, where he is currently leading FMTC’sresearch program on “Monitoring and Diagnostics”. Hismain interests are on machine diagnostics and conditionbased maintenance (CBM).

Paul Sas is a full professor at the Department of Me-chanical Engineering of Katholieke Universiteit Leuven(K.U.Leuven), Belgium. He received his master and doctoraldegree in mechanical engineering from K.U.Leuven. Hisresearch interest comprise numerical and experimentaltechniques in vibro-acoustics, active noise and vibrationcontrol, noise control of machinery and vehicles, structuraldynamics and vehicle dynamics. He is currently leadingthe noise and vibration research group of the Department ofMechanical Engineering at K.U.Leuven.

Hendrik Van Brussel is an emeritus professor at the De-partment of Mechanical Engineering of Katholieke Univer-siteit Leuven (K.U.Leuven), Belgium. He was born at Ieper,Belgium on 24 October 1944, obtained the degree of Tech-nical Engineer in mechanical engineering from the HogerTechnisch Instituut in Ostend, Belgium in 1965 and an en-gineering degree in electrical engineering at M.Sc level fromK.U.Leuven. In 1971 he got his PhD degree in mechanicalengineering, also from K.U.Leuven. From 1971 until 1973he was establishing a Metal Industries Development Centerin Bandung, Indonesia and he was an associate professor atInstitut Teknologi Bandung (ITB), Indonesia. He was a pio-neer in robotics research in Europe and an active promoter of

12


125


the mechatronics idea as a new paradigm in machine design.He has published more than 200 papers on different aspects ofrobotics, mechatronics and flexible automation. His researchinterests shifted towards holonic manufacturing systems andprecision engineering, including microrobotics. He is Fellowof SME and IEEE and in 1994 he received a honorary doctor

degree from the ’Politehnica’ University in Bucharest, Roma-nia and from RWTH, Aachen, Germany. He is also a Memberof the Royal Academy of Sciences, Literature and Fine Artsof Belgium and Active Member of CIRP (International Insti-tution for Production Engineering Research).

13


126

Health management system for the pantographs of tilting trains

Giovanni Jacazio1, Massimo Sorli

2, Danilo Bolognese

3, Davide Ferrara

4

1,2,3,4Politecnico di Torino –Department of Mechanical and Aerospace Engineering, Turin, 10129, Italy

[email protected]

[email protected]

[email protected]

[email protected]

ABSTRACT

Tilting trains are provided with the ability of rotating their

carbodies of several degrees with respect to the bogies about

the longitudinal axis of the train. This permits a train to

travel at a high speed while maintaining an acceptable

passenger ride quality with respect to the lateral

acceleration, and the consequent lateral force, received by

the passengers when the train travels on a curved track at a

speed in excess of the balance speed built into the curve

geometry. When the carbody is tilted with respect to the

bogie, the train pantograph needs to remain centered with

respect to the overhead catenary, which is aligned with the

track. The conventional solution is to mechanically link the

pantograph to the bogie, but recent tilting trains have the

pantograph connected to the carbody roof while a position

servoloop continuously control the pantograph position such

to keep it centered with the catenary. The merit of this

design is to allow a gain of the useful volume inside the

carbody. The pantograph position servoloop uses two

position sensors providing a redundant position information

to close the pantograph feedback loop and perform system

monitoring.

The monitoring functions presently implemented in

pantograph position controls are able to detect the

servocontrol failures, but in case of conflicting information

from the two position transducers they are not always able

to sort out which of the two transducer is failed because

some failures of the position transducers cannot be detected

by simply looking at the output signals of the transducer.

As a result, if a difference between the output signals of the

two position transducers is detected, the tilting function is

disabled and the train speed is reduced. Also, the entire

pantograph is then removed and replaced because the

functionality of each individual transducer can only be

checked at shop level.

Developing better diagnostic techniques for the pantograph

position control system have been encouraged by the train

companies, but no work on this subject has so far been

performed. A research activity was hence conducted by the

Authors, that was aimed at developing an advanced

diagnostic system that can both identify the presence of a

failure and recognize which of the two position transducers

is the failed one. In case of a transducer failure it is thus

possible to isolate the failed transducer and keep the

pantograph position control operational, thereby retaining

the train tilting function. A further merit of the advanced

diagnostic system is the reduction of maintenance time and

costs because the failed transducer can be replaced without

removing the entire pantograph from the train.

The general architecture of this innovative diagnostic

system, the associated algorithms, the mathematical models

for the system simulation and validation, the simulation

results and the possible future developments of this health

management system are presented in the paper.

1. THE PANTOGRAPHS OF TILTING TRAINS

Tilting trains perform carbody tilting towards curve’s inner

side, to reduce centrifugal force in curves at passengers’

level and, therefore, to maintain a better or equivalent

passenger comfort with respect to the lateral acceleration

(and the consequent lateral force) on same curves’ geometry

at enhanced service speed (Figure 1). By tilting the carbody

of a rail passenger vehicle relative to the track plane during

curve negotiation, it is therefore possible to operate at

speeds higher than might be acceptable to passengers in a

non-tilting vehicle, and thus reduce overall trip time.

_____________________

G. Jacazio, M. Sorli, D. Bolognese and D. Ferrara. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0

United States License, which permits unrestricted use, distribution, and

reproduction in any medium, provided the original author and source are credited.


127


2

Figure 1. Tilting train concept

The recognized advantage of tilting trains is to increase the

achievable service speed for passenger trains on existing

tracks without being forced to invest very large sums of

money to build a dedicated new track or to alter the

geometry of the existing curves (Boon & Hayes, 1992).

Both hydraulic and electromechanical actuation systems

have been used to provide the controlled force necessary to

tilt the carbodies of the train vehicles, though the majority of

tilting trains in revenue service use hydraulic actuation

systems.

A critical design issue associated with carbody tilting is the

need to maintain the train pantograph centered with respect

to the overhead catenary running at midpoint between the

two track rails. Most tilting trains implemented the solution

of rigidly connecting the pantograph structure to the bogie

by means of a truss passing through the carbody. This is a

simple design concept, but it reduces the useful volume

within the carbody because enough empty space must be left

around the vertical beams of the truss to accommodate

carbody tilting. Most of the tilting trains developed in the

last 10 years, however, used a different design in which the

pantograph supporting structure is directly connected to the

carbody roof while the pantograph itself can be moved

relatively to its supporting structure in a direction opposite

to the carbody tilting. By appropriately controlling the

pantograph lateral position with respect to the carbody roof

it is then possible to maintain the pantograph aligned with

the catenary also when the carbody tilts. This is

accomplished by an actuation system receiving the

commands from the train electronics as outlined in the next

section. The advantage of this solution is to allow a gain of

useful space within the carbody.

The actuation technology used for the pantograph control of

tilting trains following this design concept was the same as

the carbody tilting system. The research activity presented

in this paper was focused to the latest tilting train developed

by Alstom (the so called: "Nuovo Pendolino") which makes

use of hydraulic actuation, and the health management

system developed in this research specifically refers to a

hydraulically actuated pantograph control system. However,

the same health management philosophy can be followed to

develop effective diagnostic algorithms for an electrically

actuated pantograph control system.

2. PANTOGRAPH POSITION CONTROL SYSTEM

The control of the lateral position of the pantograph with

respect to the tilting carbody is performed by a closed loop

system using two single-acting hydraulic actuators mounted

as an opposite pair and controlled by an electrohydraulic

servovalve (Figure 2). The pantograph is mounted on a

carriage that can be moved along two tracks perpendicular

to the longitudinal axis of the vehicle. The rod end of each

of the two hydraulic actuators is connected to the carriage,

while the head end is connected to a structure fixed to the

carbody roof. Two springs mounted between the carriage

and the frame maintain the pantograph centered in its mid

position when the pantograph control system is not active.

Each of the two single-acting hydraulic actuators accepts the

controlled flow from one of the two control ports of an

electrohydraulic servovalve; therefore, the total of the

servovalve and the two single-acting hydraulic actuators is

equal to a hydraulic servocontrol comprised of a servovalve

and a double-acting hydraulic actuator. The hydraulic power

supply is provided by a constant pressure hydraulic power

generation and control unit (HPGCU) located in the train

vehicle undercarriage. The pantograph position command is

generated by the train electronics simultaneously to the

carbody tilt command as a function of the lateral

acceleration, and a position servoloop is created for the

pantograph in which the command is compared to the actual

lateral position in order to close the position feedback loop.

The servoloop position errors are processed by an

appropriate control law that eventually generates the input

signal to the flow control servovalve. The pantograph

lateral position is measured by two position sensors, with

each sensor placed inside one of the two hydraulic actuators.

The pantograph position control loop is single-hydraulic,

dual-electrical and uses a single electrohydraulic servovalve

with independent electrical coils accepting the control

currents from the two independent control computers. Each

computer interfaces with one of the two position sensors and

mutually exchanges with the other computer the information

of pantograph lateral position and servovalve current as well

as the computer health status. Each computer thus generates

an equal consolidated position feedback based on the

average of the pantograph position sensors signals.

The control law (Figure 3) is based on a PID controller with

a relatively low value of the integrator gain and a saturation

on the integrator output. The function of the integrator is in

fact to compensate for the steady state, or slow varying

servovalve offsets, while the dynamic performance is

dependent on the proportional and derivative gains of the

control law.

A comparison of the signals of the two sensors is

continuously performed during the train ride and if the


128


3

difference between these signals is greater than a given

threshold, an alert is generated and the tilting system

operation is disabled. Both the carbody and the pantograph

actuators are set in a bypass mode connecting the actuators

lines to return. The tilting carbody recenters under its own

weight while the pantograph recenters under the action of its

springs. As the train tilting is disabled, the train speed is

reduced to maintain an acceptable comfort level for the

passengers and train safety, but with the consequence of a

travel delay.

The rationale for disabling the train tilting in case of a

discrepancy between the signals of the two position sensors

of the pantograph is the concern of not always being able to

detect the failure of each individual sensor. Failures such as

a broken wire or a short circuit lead to an out of scale signal

that can be easily detected, but other failures such as, for

example, degradations originating variations of the sensors

scale factor or increased offsets are failures cases that

cannot be detected by the existing monitoring logics.

Therefore, it can well be that a difference between the

signals of the two sensors is detected, but it is not possible

to understand which of the two is the failed one. Moreover,

in case the existing monitoring system recognizes and

isolates the failed sensor, a risk exists that a subsequent

failure of the remaining active sensor might go undetected,

which could lead to a safety critical condition. The end

result is that a single transducer failure leads to a reduction

of the train speed even though the remaining transducer

could still be able to control the pantograph position.

Figure 2. Concept schematics of the pantograph position control system

Input

command

HPGCUServovalve

By-pass

valve

Servoactuator

Centring

spring

Internal

position

transducer

Moving carriage Fixed frame

Enable

command

Electronic

Control Unit

Transducer signal

Transducer signal

Position command

from train electronics


129


4

Figure 3. Block diagram of the pantograph control law

A research activity was then conducted to develop a more

sophisticated diagnostic procedure allowing to detect the

degradation of each individual transducer by appropriately

processing all available signals by means of dedicated

algorithms. This new diagnostic procedure brings about two

benefits: it sorts out which of the two transducers is failed in

case of discrepancy between the sensors signals and allows

to detect a failure of the remaining active sensor after the

other sensor has already failed. This allows the train to

maintain the tilting system active, and thus a high train

speed, after the loss of one of the two pantograph sensors,

thereby improving the tilting system availability.

A further advantage brought about by the improved

diagnostics is to simplify the maintenance operation.

Presently, when a difference between the sensors output is

signalled, the maintenance crew removes and replaces the

entire pantograph, which is a time consuming and costly

operation. The implementation of a health monitoring

system able to specifically detect the failed transducer not

only improves the tilting system availability but also

reduces the maintenance costs.

3. ADVANCED HEALTH MANAGEMENT SYSTEM

The health management system herein presented was

devised for being applied to legacy systems. It does not

require any hardware modification, but makes better use of

the available signals to enhance the ability of detecting an

anomalous behaviour of the pantograph position control

system allowing the tilting operation to continue also after a

sensor failure.

The health management system is based on real-time

modeling of the pantograph control system and consists of

three separate functions:

Coherence check

Learning process

Monitoring process

These three functions are continuously performed during the

train ride, however, when a sensor failure is detected the

learning process is permanently stopped. If a failure of the

servovalve electrical section, or of its servoamplifier is

detected, the learning process is temporarily stopped and it

resumes after the train electronics has switched the

servovalve control from the failed lane to the previously

standby lane. The purpose of the learning process is in fact

to continuously tune the values of the parameters used by

the pantograph real-time model so it can be effective as long

as all system components are operating correctly. If any

component fails, the learning process loses its significance

and the monitoring process continues using the last values

of the system parameters that were determined by the

learning process before the failure occurred.

The outputs generated by coherence check and monitoring

process are then routed to a decision maker that fuses all

information providing the train electronics with the

indication of the health of the pantograph position control

system. Figure 4 shows the flow chart of the processes

performed by the health management system. The three

functions performed by the health management system are

described in the following sections.

4. COHERENCE CHECK

The coherence check is performed on the signals of the two

position sensors and on the servovalve current. The

coherence check for the signals of the two position sensors

consists of two operations:

Verification that the output signal of the sensor is

within a valid range

Comparison between the output signals of the two

redundant sensors

The signals A and B provided by the two position sensors

are first checked to verify that they are in their valid range

of 4 to 20 mA. In case the electrical output signal is outside

this range a failure of that sensor is recognized, its signal is

discarded and the pantograph control continues using the

remaining sensor to close the position feedback loop. If both

signals A and B pass the valid range check, they are

compared to each other. If their difference is below an

s

K i

pK

dKs

+-

+

++

L

- L

- E

E

fbk

com err i


130


5

acceptable threshold, a signals coherence and hence a good

health status is recognized; however, if a difference above

the threshold prevails and lasts more than a given time, a

lack of signals coherence is detected. In this case the

position feedback, which is obtained by performing the

average of the two sensors output signals, is obviously

corrupted. When such condition occurs, the ensuing

monitoring process will sort out which sensor is good and

which failed thereby allowing the pantograph position

control system to continue to operate. Based on an analysis

of operational data the threshold for signaling a lack of

coherence was set at a value corresponding to 6 % of the

full actuators travel.

The servovalve coherence check is a monitor that is

currently performed in the pantograph actuation systems for

tilting trains. It is performed by implementing a current

wrap-around which consists of measuring the actual current

circulating through the servovalve coils and comparing it

with the current command. Each of the two servovalve coils

interfaces with one of the two sections of the control

electronics, with the two coils operated in an active/standby

mode. Only one of the two coils is active and the other coil

is activated after a failure of the first coil is detected. When

the coherence check detects a discrepancy of more than 15%

of the rated current and such discrepancy lasts more than

100 ms, a failure of the electrical section of the servovalve

is recognized. That section is then switched off and the

previously standby section is activated. In case a second

failure occurs, then the entire system is shut down and the

train tilting is disabled.

It must also be noticed that the servovalve coherence check

is instrumental in not only detecting the failures of the

electrical section of the servovalve, but also those of its

electrical driver.

5. LEARNING PROCESS

The learning process, as well as the monitoring process,

makes use of a mathematical model of the pantograph

position control system to perform their tasks. The basic

concept for learning and monitoring processes is that for a

servovalve controlled hydraulic actuator, servovalve current,

flow rate and pressure differential across the servovalve

control ports are three mutually related variables. For a

given servovalve, if two of these variables are defined, the

third one can be derived. Models of servovalve controlled

electrohydraulic systems are shown in the literature

(Borello, Dalla Vedova, Jacazio & Sorli, 2009 - Byington,

Watson, Edwards & Stoelting, 2004). For the pantograph

hydraulic actuation system the previously three referenced

variables are either known or can be determined from the

available information without additional sensors, as it will

be discussed in the following.

The servovalve current is a known variable at any instant

in time since it is generated by the electronic controller and

the fundamental issue is therefore to real time computing the

values of flow rate and pressure differential from the signals

provided by the actuators position sensors.

The calculation of the flow rate is relatively simple

because the flow rate is the product of the actuators area

times their speed . The actuators area is a known design

parameter, while the speed can be determined by performing

the time derivative of the actuators position provided by the

position sensors. The pressure differential can

thus be determined form the well known servovalve

pressure/flow relationship:

(1)

where is a known parameter defined by the servovalve

characteristics, and are the supply and return pressure

of the hydraulic system. These pressures are approximately

constant values because the train hydraulic power

generation is a constant pressure system and should the

supply pressure decrease below normal, a hydraulic system

failure is recognized by the relevant monitoring logic, while

the return pressure is constant because the reservoir is open

to the ambient.

It is important to notice that the control law of the

pantograph position servoloop consists of a PI controller in

which the control is essentially performed by the

proportional gain, while the integrator gain has a small

value, it has a saturation and its purpose is to cancel out the

effects of the steady state errors that are originated by the

servovalves offsets. By this way, the effects of the

servovalve offsets are eliminated and the servovalve is

centered in its hydraulic null when the servoloop error is

zero. The current i of all equations of this paper is thus the

current determined by the proportional gain, which actually

determines the servovalve opening, while the contribution to

the current given by the integrator gain exactly matches the

servovalve offset.

Equation (1) describes the steady-state relationship between

flow, pressures and servovalve current, and it does not

include the servovalve dynamics. For the pantograph

hydraulic control system the servovalve dynamics is about

two orders of magnitude faster than that of the overall

pantograph position servoloop; therefore, neglecting the

servovalve dynamics for the real time modeling of the

pantograph position control system does not introduce any

appreciable error.

The pressure differential across the actuators can

however be determined from the balance of the forces acting

on the actuators. This pressure differential is in fact equal to

the force globally developed by the two actuators divided by

their area.


131


6

Figure 4. Flow chart for the health management of the pantograph position control system

The forces acting on the pantograph when it is moved away

from its centered position are:

Forces developed by the centering springs

Friction forces

Lateral component of the aerodynamic force acting on

the pantograph

Inertia force associated to the mass of the translating

pantograph

For the pantograph position control system, the prevailing

force acting on the actuators is by far the force developed by

the recentering springs. The springs are preloaded and the

force that they develop is a function of the actuators position

as shown in Figure 5.

The spring forces are in theory a known quantity since the

spring stiffness ( ) is a design value. However, the

construction tolerances of the mechanical structure

accommodating the pantograph on the carbody roof and

some variations of the dimensions associated to the

temperature changes lead to some uncertainty on the value

of the springs preload ( ). While the spring rate can

reasonably be considered a well defined parameter, the

actual installed length of the springs and hence their preload

can exhibit some variation that must be properly assessed.

The friction forces ( ) are lower than the spring forces, but

still give a significant contribution to the overall force acting

on the actuators. The friction forces can exhibit a large

variation, depending on the environmental conditions, on

the condition of the carriage tracks along which the

pantograph carriage moves, and on the progressive wear of

the pantograph moving components with life.

The aerodynamic forces in the lateral direction and the

inertia forces are little significant for this application and

can be neglected by the health monitoring systems. They

actually act as potential disturbances that were properly

addressed in the assessment of the health management

system robustness.

Valid range

check

Valid range

check

Fail Good Good Fail

Sensors signal

coherence verification

Learning process:

System

parameters

identification

Uncommanded

movement / lack

of response

Command

current

Sensors signals

A B

Good

Decision maker

Fail

Failed sensor

identification

Model

update

AND

OR

Current wrap

around

Good Fail

Measured

current

Failure

detection

Sensors

status

Position

command

Remaining

active sensor

status

(Good/Fail)

System

operational

status

(Good/Fail)

Health status to

train electronics

Coherence

check

Monitoring

process

Broken

spring alert


132


7

Figure 5. Diagram of actuators displacement versus spring

forces

An important fact to be considered is that the force

developed by the springs is always in the direction of

centering the pantograph. The spring force thus acts as an

opposing load when the pantograph carriage moves away

from the centered position, while it acts as an aiding load

when the pantograph carriage moves towards the centered

position. On the contrary, the friction forces are always

opposing the carriage movement.

Based on the above outlined considerations, after having

defined a positive direction for the actuator travel , the

following simple equations for the balance of the forces

acting on the actuators can be written (note that is a

positive quantity because is the absolute value of the springs

preload).

When :

for positive actuators

speed (opposing load) (2)

for negative actuators

speed (aiding load) (3)

When :

for negative actuators

speed (opposing load) (4)

for positive actuators

speed (aiding load) (5)

When the train negotiates a curve the pantograph is

commanded to move laterally in one direction to counteract

the carbody tilting in the other direction, that is followed by

a command back to zero when the train exits the curve.

Over this period of time the learning process is activated.

While the pantograph is moving away from center, the

opposing load condition Eq. (2) or Eq. (4) prevails, while

the aiding load condition Eq. (3) or Eq. (5) prevails when

the pantograph travels back to center. Therefore, the

learning algorithm works in the following way.

When the train enters a curve and the pantograph travels

away from center, the algorithm uses Eq. (1) to compute the

value of , which is then used by Eq. (2) or Eq. (4)

to compute the value of based on the value of the

actuator position and on the known design parameters

and . This calculation is performed for predetermined

values of the actuators position . When the train exits the

curve and the pantograph moves back to the centered

position, the same calculations are performed for the same

values of actuators position , but using Eq. (3) or Eq. (5),

thereby determining the values of . Since no

changes of springs preload and frictional losses occur in the

short time interval between entering and leaving a curve, by

knowing and for the same value of it

is possible to find out the values of and .

The computed values of and are stored in memory for

each value of actuator travel and a moving average is then

performed which adapts the values of and to the

variations that can occur in service. However if a sudden

large reduction of spring preload F0 is detected by the

learning process, this would be the result of a broken spring;

an alert is then generated and sent to the decision maker.

The above described learning process occurs only when the

absolute value of the actuation speed is above a minimum

threshold , since very small actuation rates could lead to

less accurate results. The learning process concept block

diagram is shown in Figure 6.

The learning process continues as long as the coherence

checks provide a positive output. If a sensor failure is

recognized or if a difference between the signals of the two

position sensors above the established threshold is

detected, and that difference lasts more than a given time

, then the learning process is discontinued and the

modeling process reverts to the monitoring process

described in the next section.

6. MONITORING PROCESS

The logic for the monitoring mode is described by the block

diagram of Figure 7. The monitoring process performs two

basic functions:

Detects uncommanded movements or lack of response

of the pantograph actuators (Figure 7 – a)

Detects sensors failures that were not identified by the

coherence check (Figure 7 – b)

Detection of uncommanded movements or lack of response

is a relatively straightforward operation: the actuators rate

computed from the time derivative of the position signals is

compared with the rate of change of the position command.

If a discrepancy exists and lasts more than a given amount

of time, a failure is recognized. This monitor is continuously

performed, but in case one of the two position sensors is

-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25-4000

-3000

-2000

-1000

0

1000

2000

3000

4000

Actuator displacement [m]

Spring f

orc

e [

N]

Spring 1

Spring 2


133


8

Figure 6. Concept block diagram of the learning process

failed, the uncommanded movement / lack of response

monitor is temporarily stopped and it is resumed after the

other monitors have identified which of the two position

sensors is the good one. This temporary pause of about 100

ms for the uncommanded movement / lack of response

monitor is instrumental in avoiding a false indication of

wrong system operation. Detection of sensors failure not

identified by the coherence check is a more challenging

task, which is described hereunder.

The actuator speed is computed by performing the time

derivative of the signals and received from the two

position sensors; two values and are then obtained

for the actuator speed. These values are compared with the

actuator speed and computed from the system

model described in the previous section by using the last

values of and determined in the course of the learning

process. The absolute value of the difference between

actual and computed actuator speed is processed by a

filtering element whose purpose is to eliminate undesired

noise in the monitoring process. The filtering element sets

its output equal to only when is greater than a

minimum value . This prevents differences resulting

from the inaccuracies of the modeling process to be counted

as errors. The resulting errors and for the two position

sensors are divided by the actuator speed uMA and uMB in

order to obtain two non-dimensional quantities, eA’ and eB’.

These non-dimensional errors are then integrated with time

and the integrators outputs and are used for

recognizing a sensor failure. If the coherence check

signalled a difference between the two sensors but was

unable to decide which of the two was the failed one a cross

monitoring logic of the monitoring process is able to sort

out the failed sensor. If a sensor is malfunctioning, its

relevant integrator output ( or ) grows faster than the

other, and by looking at which of the two outputs ( or )

is greater, it is possible to sort out which is the failed sensor.

It must be emphasized that for this condition the monitor

does not compare the computed value of a certain quantity

against an acceptable limit and has to decide whether a

failure has occurred or not. The monitor already knows from

the coherence check that a failure exists and simply

compares two quantities ( and ) to realize which of the

two sensors is failed. In this condition, there is an extremely

low probability of error: the quantity relevant to the failed

sensor will definitely be greater than that for the healthy one

and the failed sensor can be positively identified with

practically zero error probability.

YesCompute time

derivativeIf ABS(u) > uT

Identify opposing /

aiding condition1/2

Compute

force

Parameters

identification

Control current

Computed parameters:

- spring preload

- friction force

No

Leave learning process

+

+

xA

xB

Sensors signals

x u

Learning process


134


9

Figure 7. Concept block diagrams of the monitoring process

+

-

Position

signals

consolidation

Position

command

A

B

Rate

computation

Verification of

coherence between

position error and

actuation rate

Position

error

Uncommanded

movementGood

Lack of

response

Actuation

rate

Position signals

health status

a)

Time

derivative

+

-

Model

ABS

eA

|δuA|uMIN

Filter

Integrator

IA= ∫ eA’ dt

+-

ABSWhen

|δx| > δxT

Individual monitor

If IA < IMAX

Cross monitor

Greater of IA and IB shows failed sensor

Yes:

A good

No:

A fail

Lane A monitor

Lane B monitor

B

Fail

B

Good

xA

xB

δx |δx|

uA

actual

Sensors status to decision maker

δuA |δuA|

uA

eA

uM

model

IA

Reset IA = 0

when xA = 0

GoodFail B GoodFail A

b)

x


135


10

When only one sensor is active because the other one was

recognized failed, the monitoring process continues for the

remaining healthy one using the last values of and

determined in the course of the learning process. Obviously,

in this case it is not possible to compare the signals of the

two sensors. Therefore, the monitoring logic relies on

comparing the time integral of the absolute value of the

error resulting from the filtered difference between the

actual and computed actuators speed with a limit

threshold . When the integrator output becomes

greater than a failure is recognized.

Since the monitoring process is meaningful only when the

pantograph is commanded to move, the integrators outputs

( and ) are reset to zero when the pantograph is centered.

This instruction prevents that occasional disturbances, not

related to sensors malfunctionings, are progressively added

by the integrator and possibly generate a false alarm.

Since the monitoring process implemented when only a

single sensor is less accurate than the one for the case of two

sensors active, the limit beyond which a sensor failure

is recognized cannot be set too low to minimize the risk of

false alarms. A comprehensive simulation campaign was

thus performed to establish an optimum value of , such

to obtain the fastest possible recognition of a failure while

minimizing the risk of false alarms.

7. DECISION MAKER

The decision maker consists of a logic routine accepting the

output signals from the coherence check and the monitoring

process to provide the train electronics with the information

of the health status of the pantograph control system.

The decision maker issues the warning of a position sensor

failure ( or ) if such failure has been detected either by

the coherence check or by the monitoring process. In case a

failure of the remaining active sensor is detected after the

other sensor had already failed, an alarm is issued signaling

the loss of pantograph position information.

If the current wrap around performed by the coherence

check detects a failure of the servovalve electrical section, a

warning is issued such that the train electronics can activate

the other servovalve electrical channel.

If a subsequent failure of this other section of the servovalve

occurs, then an alarm is issued indicating loss of pantograph

control.

If an uncommanded movement, or a lack of response is

detected by the monitoring process, the decision maker

issues again an alarm indicating loss of pantograph control.

8. PERFORMANCE ASSESSMENT OF THE HEALTH

MANAGEMENT SYSTEM

The merits of the health management system presented in

this paper were assessed running several simulations of a

model representing the dynamic response of a train

pantograph. In particular, the mathematical model

specifically referred to the Alstom Ferroviaria "Nuovo

Pendolino" train.

In order to assess the merits of the diagnostic system, a

comprehensive complex mathematical model representing

both tilting and pantograph actuation systems was

developed. This model is of a physical based type, based on

the mathematical relationships among the state variables and

the physical parameters. The model proved to be very

accurate when later compared with the data measured

during revenue service operations. Several time histories of

tilt angle commands and actual responses were available,

and the same sequences of commands were injected into the

model and the relevant responses were computed. An

example of comparison is shown in Figure 8, and similar

accuracies were found for all type of tilt commands, and the

validity of the system model was thus positively verified.

This mathematical model acts as a virtual hardware was

then used in place of the actual hardware to verify the

performance of the diagnostic system. Several simulations

were performed, both in normal and in failed conditions, in

order to assess the ability of the health management system

to properly identify a failure of one or both position

transducers and to avoid false alarms. Failures of the

servovalve and of the actuators leading to uncommanded

movements or lack of response were also simulated, but do

not represent a specific advance since the relevant

monitoring logics are normally implemented in hydraulic

servocontrols. The following of this paper is thus focused to

the failure cases of the position sensors. Several simulations

were also performed changing the system physical

parameters in order to check the ability of the learning

process to properly adapt the model parameters to the

varying conditions so as to avoid false failure indications.

Simulations were initially run with the nominal values of

the system parameters to test the health management system

under normal operating conditions. Several pantograph

movements were commanded so as to simulate different

rides, changing both the amplitude and the velocity of the

pantograph movements.


136


11

Figure 8. Example of comparison between virtual hardware

model results and test data

An example of the simulations for operation with nominal

values of the pantograph control system parameters is

shown in Figure 9. Since the commands to the pantograph

actuators are synchronized with the commands to the tilting

actuators, the position commands reported in the y axis of

the upper diagram of Figure 9 are indicated as tilt angle

commands that are comprised from 0° to 8° (maximum tilt

angle). The steady conditions, which are representative of a

travel either in the middle of a curve or in a straight track,

last 5 s. The simulation of Figure 9 was conducted assuming

that both position transducers are initially operating

correctly and that transducer 1 fails at time t = 40 s. In this

case the integral of the error for the remaining active sensor

was computed as defined in section 5 of this paper.

Looking at Figure 9 it can be seen how the error integral

always remains below the fault indication limit and no false

alarm indication is then generated by the monitoring

process.

Simulations were then run changing the values of the

parameters of the pantograph control system, that were

varied in a range can be reasonably expected for the trains in

regular revenue service. The purpose of these simulations

was to test the ability of the learning process to

progressively tune the model values tuning part such to

avoid false failure indications. The physical parameters that

were made vary in order to simulate the whole range of

operating and environmental conditions were:

External load

Spring rate

Spring preload

Friction force

Supply pressure

Figure 9. Health management system assessment: one

sensor active - nominal values of the system parameters

Examples of the health management system performance

are reported in Figure 10 and Figure 11. In particular, Figure

10 refers to the case in which the pantograph is subjected to

a cross wind load of 3000 N, while Figure 11 corresponds to

an operation with a supply pressure reduced from 31.5 to 25

MPa. As for the previous simulation case shown in Figure 9,

it was assumed that a sensor failure occurred at time 40 s in

order to check the ability of the monitoring process to

correctly detect the failure. For the case in Figure 10 it can

be seen that as the cross wind load is applied at time zero, as

long as the sensors are operating correctly the error integral

remains well below the warning threshold, and the value

that is built up at the end of each curve progressively

decreases because the learning process adapts the values of

the system parameters to the changed conditions. No false

alarm is generated and the ability is shown of the learning

process to properly adapt the values of the model

parameters.

For the case in Figure 11, the sudden drop of the supply

pressure from 31 to 25.5 MPa causes the integral of the

error to exceed the threshold for a very small amount of

time for both sensors. Since the two position sensors are

actually operating correctly and passed the coherence check,

the simultaneous overcoming of the threshold for the two

error integrals does not trigger a failure indication. When a

transducer 1failure is actually injected at time t = 40 s, a loss

of coherence between the two sensors is recognized and the

error integral grows much above the threshold, thus

indicating the sensor failure. From that time on the sensor

signal is discarded and the operation continues counting

only on the signal of the other transducer.

After having verified that no undue false alarms were

generated by the health management system for any

combination of environmental and operating conditions of

the pantograph control system, simulations were then run

558 560 562 564 566 568 570 572

0

0.5

1

1.5

2

2.5

Time [s]

Til

tin

g a

ng

les

[°]

Focus on Battipaglia-Reggio Calabria angles

command

calculated position

real position

0 100 200 300 400 500 600-4

-2

0

2

4

Time [s]

Til

tin

g a

ng

les

[°] Battipaglia -Reggio Calabria tilting angles

command

calculated position

real position

0 10 20 30 40 50 60 70 80 90 100

-5

0

5

Time [s]

An

gle

[°]

Reference position

Transducer 1

Transducer 2

0 10 20 30 40 50 60 70 80 90 1000

0.2

0.4

Time [s]

Inte

gra

ted

err

or

Threshold

Integrated error transducer 1

0 10 20 30 40 50 60 70 80 90 1000

0.2

0.4

Time [s]

Inte

gra

ted

err

or

Threshold


Transducer 1 failure


137


12

injecting different types of failures, and again this was done

over a wide range of service conditions for the train.


sensor active - presence of a cross wind load of 3000 N

Failures such as an internal short or a broken wire of one of

the position sensor are immediately picked up by the

coherence check, thus the simulations were focused to those

types of failures that are not easily detected by the

monitoring logic presently implemented in the trains in

service.

Sensors failures addressed by the simulations were:

Step change of the sensor offset

Step change of the sensor gain

Slow change with time of the sensor offset

Slow change with time of the sensor gain

A few typical examples of the simulations results are shown

in Figure 12 through Figure 15. Figure 12 shows the case in

which the sensor 1 offset is subjected to a step change of 6

%, which could be the result of an electrical degradation, or

of a permanent mechanical realignment determined by an

occasional large jerk during the train ride. After the offset

change the output signals of the two sensors are different

and the coherence check will thus alert of a failure. The

ensuing monitoring process then looks at the error integrals

and easily identifies the failed sensor because its error

integral is much greater than that of the healthy sensor. The

same happens for the case of a step change of the gain of

one of the two sensors (Figure 13). When the pantograph is

commanded to move away from center a difference between

the output signals of the two sensors greater than the

threshold is detected by the coherence check, which thus

issues a failure alert. The ensuing comparison between the

error integrals performed by the monitoring process

identifies the failed sensor because its error integral is much

larger than that of the good sensor. Results similar to those

shown in Figure 12 and Figure 13 are obtained for the cases

of a progressive variation of a sensor offset or of a sensor

gain. When the difference between the output signals of the

two sensors is large enough to activate the lack of coherence

alert, the difference between the error integrals computed by

the monitoring process is large and the identification of

which of the two sensors is the failed one can be performed

without error.


sensor active - system supply pressure reduced from 31.5

MPa to 25 MPa

Progressive variations of one sensor offset and gain were

simulated and are shown in the diagrams of Figure 14 and

Figure 15 to assess which was the maximum error attained

in the pantograph position measurement before the

monitoring process recognizes the sensor failure. The

simulations were performed using a heavy duty track as

pantograph position command sequence. It can be seen from

the simulations results that in both cases the error integrals

tend to increase until they reach a point for which a

pantograph position command greater than a minimum

value makes the error integral to overcome the threshold,

thereby triggering the failure alert. In particular, it can be

observed from Figure 14 how the error integral overcomes

the threshold a few times between approximately 350 s and

550 s before the failure indication is eventually activated.

This is due to the fact that, because of the progressive offset

variation, the transducer indicates an incorrect pantograph

position, but the pantograph position error is not large

enough to activate the lack of coherence check. The

transducer is eventually declared failed at time 550 s, when

the position error of the degrading transducer leads to a

difference from the healthy transducer signal such to signal

a lack of coherence. As this alert is generated, the ensuing

monitor is enabled which recognizes as failed the transducer

with higher error integral. The signal of the failed position

sensor is ignored from then on and no more signals taken

into account in the pantograph position servoloop. For the

0 10 20 30 40 50 60 70 80 90 100

-5

0

5

Time [s]

An

gle

[°]

Reference position

Transducer 1

Transducer 2

0 10 20 30 40 50 60 70 80 90 1000

0.2

0.4

Time [s]

Inte

gra

ted

err

or

Threshold


0 10 20 30 40 50 60 70 80 90 1000

0.2

0.4

Time [s]

Inte

gra

ted

err

or

Threshold


0 10 20 30 40 50 60 70 80 90 100

-5

0

5

Time [s]

An

gle

[°]

Reference position

Transducer 1

Transducer 2

0 10 20 30 40 50 60 70 80 90 1000

0.2

0.4

Time [s]

Inte

gra

ted

err

or

Threshold


0 10 20 30 40 50 60 70 80 90 1000

0.2

0.4

Time [s]

Inte

gra

ted

err

or

Threshold





138


13

cases of Figure 14 and Figure 15 the failure indication

occurs when the maximum error of the position transducer

is respectively 6 % and 9 %.

Figure 12. Failure simulation scenario #a: The two position

sensors are initially good, then position sensor #2 is

subjected to a step change of its offset

Figure 13. Failure simulation scenario #b: The two position

sensors are initially good, then position sensor #2 is

subjected to a step change of its gain

9. CONCLUSION

The work herein presented was carried out in order to define

a technique able to recognize the failure of the sensors used

to measure the lateral position of the pantograph of high

speed tilting trains equipped with laterally translating

pantographs with minimum risk of missed failures and false

alarms. This would allow an unabated operation of the train

tilting system after a failure of one of the two lateral

position sensors of the pantograph, while the present

monitoring system disables the tilting operation and reduces

the train speed after a single sensor failure.

Figure 14. Failure simulation scenario #c: The two position

sensors are initially good; then sensor #1 undergoes a

progressive variation of its offset

Figure 15. Failure simulation scenario #d: The two position

sensors are initially good; then sensor #1 undergoes a

progressive variation of its gain

The health management system described in this paper was

first tested simulating train rides over different tracks and

for the entire range of operating and environmental

conditions, and appropriate limits for the failure detection

were established to prevent false alarms. Then, all types of

sensors failures and malfunctionings were injected and the

ability of the health management system to recognize them

was positively assessed.

The results of the entire simulation campaign proved the

robustness of the proposed health management system and a

confidence was hence gained in its ability to detect a sensor

failure or malfunctioning with minimum risk of false alarms

or missed failures. The implementation of such health

management system on a tilting train will thus enable the

0 10 20 30 40 50 60 70 80 90 100

-5

0

5

Time [s]

An

gle

[°]

Reference position

Transducer 1

Transducer 2

0 10 20 30 40 50 60 70 80 90 1000

0.2

0.4

Time [s]

Inte

gra

ted

err

or

Threshold


0 10 20 30 40 50 60 70 80 90 1000

0.2

0.4

Time [s]

Inte

gra

ted

err

or

Threshold


0 10 20 30 40 50 60 70 80 90 100

-5

0

5

Time [s]

An

gle

[°]

Reference position

Transducer 1

Transducer 2

0 10 20 30 40 50 60 70 80 90 1000

0.2

0.4

Time [s]

Inte

gra

ted

err

or

Threshold


0 10 20 30 40 50 60 70 80 90 1000

0.2

0.4

Time [s]

Inte

gra

ted

err

or

Threshold


0 100 200 300 400 500 600 700 800

-5

0

5

Time [s]

An

gle

[°]

Reference position

0 100 200 300 400 500 600 700 800-2

0

2

Time [s]

An

gle

[°]

Pantograph position error

0 100 200 300 400 500 600 700 8000

0.2

0.4

Time [s]

Inte

gra

ted

err

or

Threshold


0 100 200 300 400 500 600 700 800

-5

0

5

Time [s]

An

gle

[°]

Reference position

0 100 200 300 400 500 600 700 800-2

0

2

Time [s]

An

gle

[°]

Pantograph position error

0 100 200 300 400 500 600 700 8000

0.2

0.4

Time [s]

Inte

gra

ted

err

or

Threshold



139


14

tilting operation to continue after a failure of a pantograph

lateral position sensor, hence allowing the train to maintain

its high speed travel for the remainder of the ride.

Furthermore, the positive recognition of a sensor failure

would greatly ease the maintenance operation, since the

failed sensor can be replaced without the need of removing

the entire pantograph assembly from the train roof.

ACKNOWLEDGEMENT

The authors wish to thank the tilting trains manufacturer

Alstom Ferroviaria for their support in the preparation of

this paper.

NOMENCLATURE

servovalve current

servovalve flow rate

servovalve deflux coefficient

actuators area

actuators displacement

actuators speed

pressure at servovalve control port 1

pressure at servovalve control port 2

hydraulic system supply pressure

hydraulic system return pressure

springs stiffness

springs preload

friction forces

learning process time threshold

filtered actuator speed error

integral of the actuator speed error

REFERENCES

Boon C.J. & Hayes W.F. (1992). High speed rail tilt train

technology: a state of the art survey. US Department of

Transportation - Federal Railroad Administration.

Jacazio G., Risso D., Sorli M., & Tomassini L. (2010).

Advanced diagnostics of position sensors for the

actuation systems of high-speed tilting trains. Annual

Conference of the Prognostics and Health Management

Society. Paper phmc_10_047.

Elia A. (1998). Fiat Pendolino: developments, experiences

and perspectives. Proc. IMechE, Part F: J. Rail and

Rapid Transit, 212(1), 7–17.

Harris H.R., Schmid E. & Smith R.A. (1998). Introduction:

theory of tilting train behaviour. Proc. IMechE, Part F:

J. Rail and Rapid Transit, 212(1), 1–5.

Borello L., Dalla Vedova M., Jacazio G. & Sorli M. (2009)

A prognostic model for electrohydraulic servovalves.

Annual Conference of the Prognostics and Health

Management Society, Paper phmc_09_66.1.

Byington C.S., Watson M., Edwards D. & Stoelting P.

(2004). A model-based approach to prognostics and

health management for flight control actuators, IEEE

Aerospace Conf. Proc. (Vol. 6, pp. 3551-3562).

BIOGRAPHIES

G. Jacazio is professor of applied mechanics and of

mechanical control systems. His main research activity is in

the area of aerospace control and actuation systems and of

prognostics and health management. He is a member of the

SAE A-6 Committee on Aerospace Actuation Control and

Fluid Power Systems, and a member of the international

society of prognostics and health management.

M. Sorli is professor of applied mechanics and of

mechatronics. His research interests are in the areas of

mechatronics, mechanical and fluid servosystems, spatial

moving simulators, smart systems for automotive and

aerospace applications. He is member of the TC

Mechatronics of IFToMM, ASME and IEEE.

D. Bolognese is a research engineer. His research interests

are in the area of simulations of mechanical and fluid

systems

D. Ferrara is a PhD student in Mechanical Engineering.

His research interests are in the areas of aerospace actuation

and control systems and of prognostics and health

management.


140

Lifetime models for remaining useful life estimation with randomly

distributed failure thresholds

Bent Helge Nystad1,2

, Giulio Gola1,2

and John Einar Hulsund1

1Institute for Energy Technology, Halden, Norway

2IO Center for Integrated Operations, Trondheim, Norway

[email protected], [email protected], [email protected]

ABSTRACT

In order to predict in advance and with the smallest possible

uncertainty when a component needs to be fixed or

replaced, lifetime models are developed based on the

information of the component deterioration trend and its

failure threshold to estimate the stochastic distribution of the

hitting time (the first time the deterioration exceeds the

failure threshold) and the remaining useful life. A primary

issue is how to effectively handle the uncertainties related to

the component deterioration trend and failure threshold.

This problem is here investigated considering a non-

stationary gamma process to model the component

deterioration and a gamma-distributed failure threshold.

Two lifetime models are proposed for comparison on an

application concerning deterioration of choke valves used in

offshore oil platforms.

1. INTRODUCTION

The capability of predicting when maintenance actions are

required is a primary issue for every industry and bears the

advantages of enhancing operational safety and maximizing

plant reliability. In this respect, to estimate in advance and

with an acceptable level of uncertainty the component

remaining useful life, one can either define a failure time

probability based on the failure times records of a large

number of similar components, or exploit the information

on the component deterioration trend during operation

(Nystad, 2008; Gola & Nystad, 2011a). The latter approach

is less conservative and allows tailoring maintenance

planning to the specific case and, as a consequence,

maximizing the usage of the component.

In practice, lifetime estimation models (van Noortwijk,

2009; Lu & Meeker, 1993) are devised to combine the

knowledge of the past deterioration trend and the current

degradation state with the failure threshold and to estimate

the hitting time (Abdel-Hameed, 1975; Frenk & Nicolai,

2007) and the remaining useful life (van Noortwijk, 2009;

Rausand & Høyland, 2004).

The uncertainty associated to the deterioration trend is here

modelled by a non-stationary gamma process (Gola &

Nystad, 2011a; van Noortwijk, 2009). A gamma process is a

stochastic process with independent, non-negative gamma-

distributed increments and represents a valuable option to

model monotonic processes, i.e. with gradual damage

monotonically accumulating over time in a sequence of

increments such as wear, fatigue, erosion/corrosion, crack

growth, erosion, creep and swell.

The specification of the failure threshold is a critical issue

(Nystad, 2008; van Noortwijk, 2009). In fact, using a

deterministic threshold is problematic since the same

component can fail at different degradation levels.

Typically, an unbiased estimate of the threshold mean value,

or a conservative lower-bound threshold estimate are

supplied. Nevertheless, if the threshold value is set too high,

the risk of actual component failure will increase. On the

contrary, a conservative low threshold value reduces the risk

of failure, but increases the failure probability to a point in

which the component can be prematurely put off operation.

For some applications, e.g. cable aging due to thermal and

mechanic damage (Fantoni & Nordlund, 2009), the designer

may not know with certainty what explicit level of

degradation causes a failure. If threshold failure data are

scarce an alternative source of information are engineers

with expertise within the relevant field. Such experts can

provide useful information about the threshold probability

distribution in form of best estimates of percentiles.

This problem is here tackled by considering the threshold as

a random variable with a gamma probability distribution. A

likelihood function can then be established based on the

expert judgment in terms of percentiles (Welte & Eggen,

2008).

_____________________

Bent H. Nystad et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License,




141


2

A practical application concerning erosion in choke valves

used in the oil and gas industry is considered (Gola &

Nystad, 2011a; Bringedal, Hovda, Ujang, With &

Kjørrefjord, 2010) and two lifetime models for estimating

the remaining useful life are proposed and compared.

2. THE HITTING TIME AND REMAINING USEFUL LIFE

Since the failure threshold variability does not depend on

the temporal uncertainty associated to the deterioration trend

but only on the historical failure records of the component,

it is reasonable to assume that the threshold distribution is

independent from the deterioration distribution (Abdel-

Hameed, 1975).

In this view, the cumulative density function of the hitting

time is defined in Abdel-Hameed (1975) and can be written

for each time 0t as:

( )

0 0

( )

0

( ) Pr ( )

( ) ( )

( ) ( )

Y

x

X t Y

x y

Y X t

x

H t X t Y

f x f y dydx

F x f x dx

(1)

where ( )X tf is the probability density function (pdf) of the

deterioration trend ( ) 0X t , Yf is the pdf and YF is the

cumulative density function (cdf) of the failure threshold

0Y (satisfying (0) 0YF ).

The meaning of Eq. (1) is illustrated in Figure 1 using the

choke valve case study data (see Section 3). Based on the

erosion data for the operational time interval 0,280t

(diamonds in Figure 1), the expected value (solid line) and

5th

and 95th

percentiles (dashed lines) of the fitted gamma

process with assumed power law shape are shown. Notice

that the functional shape of the erosion process at timestamp

280t is convex. The failure threshold is here defined as a

gamma distribution, i.e. the hazard zone (red contour plot in

the Figure). The probability of failure in the operational

time is illustrated by the hitting time pdf (blue line).

Figure 1. The hitting time probability density function (blue

line); fitted gamma process with power-law shape (black

solid and dashed lines) and a gamma distributed hazard zone

(red).

The remaining useful life at time 0t s is derived from

Eq. (1) (Rausand & Høyland, 2004) and is here calculated

by resorting to a state-based approach (Gola & Nystad,

2011b) which accounts for the knowledge of the

deterioration state sx at time t s :

( ) ( )( )

1s

Y Y sX t X s s

Y sx

F x F xRUL t f x x dx

F x

(2)

Recalling that in a time-based perspective the deterioration

x is a function of t , the pdf sX t X sf x x represents

the probability of having at time t a deterioration increment

sx x and the term 1Y Y s Y sF x F x F x is the

left-truncated cdf of the failure threshold providing the

probability of having the failure threshold y between the

current deterioration state sx and infinity.

The meaning of Eq. (2) is illustrated in Figure 2. The fitted

gamma process is the same with the exception that here

there is no uncertainty in the erosion in the operational time

interval 0,280t . The expected value (solid line)

remains unchanged; the 5th

and 95th

percentiles (dashed

lines) are instead calculated based on the erosion increment

sx x . The left-truncated failure threshold (red contour plot)

and the pdf of the RUL (blue line) are finally shown.

Hazard zone


142


3

Figure 2. RUL probability density function (blue line); fitted

gamma process with power-law shape (black solid and

dashed lines); left-truncated gamma distributed hazard zone

(red).

Nevertheless, for a distribution without memory (as e.g. the

exponential distribution) there is no advantage in left-

truncating the cdf of the threshold and therefore the

expression of the remaining useful life becomes the same as

the left-truncated version of the hitting time.

Notice that since the hitting time model in Eq. (1) considers

uncertainty in the whole deterioration trend from 0t to

infinite, the associated uncertainty calculated at 0t s is

higher than that of the pdf depending only on the prediction

from t s to infinity.

2.1. The deterioration model

The deterioration ( )X t is here modelled as a non-stationary

gamma process (van Noortwijk, 2009) with a time-

dependent pdf written as:

( )( ) 1

( ) ( )( )

v tv t ux

X t

uf x x e

v t

(3)

where ( ) 1

0

( ) v t zv t z e dz

is the gamma function with

shape parameter ( ) 0v t and scale parameter 0u ,

(0) 0X with probability one, the deterioration increment

( ) ( )X t X s gamma-distributed with shape parameter

( ) ( )v t v s and scale parameter u for any 0t s and the

stochastic process ( ), 0X t t having independent

increments. The shape function ( )v t must be non-

decreasing, right-continuous and real-valued for 0t , with

(0) 0v and ( )v . When ( )v t is linear the gamma

process is stationary and it is non-stationary when ( )v t is

non-linear.

2.2. The threshold model

Indeed, the hitting time (Eq. 1) and remaining useful life

(Eq. 2) models are well suited to handle different types of

uncertainties of the failure threshold related for example to

the estimate of the initial deterioration (due to imperfect

maintenance or production defects), to the manufacturing

variability and to the historical measurements.

In this paper, a gamma-distributed failure threshold

( , )Y Ga y with shape parameter 0 and scale

parameter 0 is considered, with pdf and cdf given for

any 0y as:

1( )( )

( , )( ) , 0,

( )

yY

Y

f y y e

yF y y

(4)

where 1

0

( , )

y

zy z e dz

is the lower-incomplete

gamma function. Notice that the shape parameter is in

this case a time-independent constant.

2.3. Expected deteriorations

In general, the expected deterioration ( ( ))E X t can be

linear, concave, convex or any combination of these. As

discussed in van Noortwijk (2009), the power law function

is a flexible candidate for linear, concave and convex

deterioration.

( ) bv t ct

E X tu u

(5)

In this case, the gamma process is linear and stationary if

1b , non-stationary concave and convex if 1b and

1b , respectively.

However, the process in Eq. (5) cannot describe a

deterioration trend both concave and convex. Given the

restrictions on ( )v t , a candidate process, which describes

the expected degradation that is first concave and then

convex (i.e., z-shaped) is:

Hazard zone

s

xs


143


4

sinh sinhc ab a t b

E X tu

(6)

where the shape parameter 0b is the timestamp of

inflection and 0a is related to the size of the derivative in

the inflection point.

An example of an expected deterioration as in Eq. (6) is the

impact of external stress on materials/devices (Mc Pehrson,

2010). The net reaction rate for material/device degradation

becomes concave (linear) with low stress and convex with

high stress.

2.4. Inference of the model parameters

In practice, the application of the gamma process requires

using statistical methods for estimating the parameters from

the available measurements. For the gamma process a

typical data set consists of inspection times it , 1, ,i n ,

where 0 1 20 nt t t t , and the corresponding

observations of cumulative amounts of deterioration ix ,

1, ,i n , where 0 1 20 nx x x x . The

estimators for the scale parameters u and c for the power

law (Eq. 5) and z-shaped (Eq. 6) degradations can be

derived by the method of moments or the method of

maximum likelihood (van Noortwijk, 2009). The method of

moments leads to attractive and simple formulae for the

parameters, but it requires knowledge of the shape

parameters values of the power law b and z-shaped

,a b degradations which are either given based on

experts’ opinion (Welte & Eggen, 2008) or can be inferred

numerically by least square optimization. On the other hand,

the method of maximum likelihood, explained in van

Noortwijk (2009), allows estimating directly the shape and

scale parameters, at the expenses of larger computational

costs. In the application that follows, first least square

optimization is used to determine the shape parameters and

then the method of moments is applied to calculate the scale

parameters ,u c .

Concerning the failure threshold, historical values for a

number of similar components can be used to calculate the

mean and standard deviation of the failure threshold

distribution (Nystad, 2008). For highly reliable components

for which failures are rare, one can use few deterioration

samples with their associated parameters and Monte Carlo

simulations to generate a large number of deterioration

paths. Different threshold values randomly selected from a

threshold distribution can then be used to estimate the

hitting time (Lu & Meeker, 1993). Finally, a source of

information is constituted by field experts (Welte & Eggen,

2008). Since the meaning of many probability distribution

parameters is rather abstract, experts have usually problems

estimating them directly. In fact, experts can provide useful

information about the threshold distribution in terms of best

estimates (mean, median, mode) or percentiles (e.g. a 10th

percentile corresponding to early failures) which can be

used to estimate the parameters of two-parametric

probability distributions like the gamma distribution.

3. DETERIORATION OF CHOKE VALVES

The application proposed in this paper concerns

deterioration of choke valves undergoing erosion (Bringedal

et al., 2010; Andrews, Kjørholt & Jøranson, 2005). In

offshore oil platforms, choke valves are used on the surface

to control the flow of hydrocarbons and protect the

equipment from unusual pressure fluctuations. Production

experience has shown that choke valves are prone to sand

erosion in the disks and in the outlet sleeve (Andrews et al.,

2005). The main parameters determining erosion are the

impact velocity and the impact angle of the sand grains

through the choke discs.

Figure 3. Damage caused by sand erosion. In the picture the

original circular holes in the disks have a major wear on the

upper side on the left hole and lower side on the right hole.

From the mathematical point of view, the flow characteristic

VC is defined so that the pressure differential p across the

choke valve is constant and total mass flow rate w through

the valve is proportional to the valve flow coefficient VC

which is related to the effective flow cross-section of the

valve and therefore depends on the valve opening.

V

pw C

(7)

where is the average mixture density. The VC curve is

specific to the valve type and size and for a given valve

opening VC is expected to be constant (Kirmanen, Niemelä,

Pyötsiä, Simula, Hauhia & Riihilahti, 2005). The VC

characteristic curve is the baseline for a good as new valve

and is often provided by the valve constructors. When

erosion occurs, a gradual increase of the effective flow

cross-section is observed even at constant pressure drop.

Such phenomenon is therefore related to an abnormal

increase of the valve flow coefficient with respect to its

expected baseline value, hereby denoted as b

VC . For this

reason, for a given valve opening the difference VC


144


5

between the actual flow coefficient and its baseline is

retained as an indicator of the valve erosion. The difference

V

b

C V VC C is expected to be monotonically increasing

throughout the life of the valve, thus reflecting the physical

behavior of the erosion process. When VC eventually

reaches an established erosion threshold, the valve must be

replaced (Gola & Nystad, 2011a)

The valve flow coefficient VC in a multiphase environment

cannot be directly measured, but it can be calculated from

the following analytical expression which accounts for the

physical parameters involved in the process:

2

6

o w g go wV

o w gp

w w w ff fC

JN F p

(8)

where ow , ww and gw are the flow rates of oil, water and

gas, of ,

wf and gf the corresponding fractions with respect

to the total flow rate and o ,

w and g the corresponding

densities. J is the gas expansion factor, pF is the piping

geometry factor and 6N is a constant equal to 27.3

(Kirmanen, Niemelä, Pyötsiä, Simula, Hauhia & Riihilahti,

2005). The quality of the available data of the physical

parameters in Eq. (8) differs because p is directly

measured, whereas oil, water and gas flow rates are

calculated based on daily production rates of other wells of

the same field. Improvement of the valve erosion indicator

VC based on additional information from well tests carried

out throughout the valve life is discussed in Gola and

Nystad (2011a). Therefore, in this paper, a single choke

valve undergoing erosion is considered and hitting time

models and new RUL models based on Eq. (2) are applied

to the VC trend obtained in Gola and Nystad (2011a) as a

function of the operational days. The valve was opened and

checked to be found in a failed state at operational time

307nt days.

Expert judgment is here used to define the failure threshold

probability distribution (Welte & Eggen, 2008). For a

gamma-distributed threshold, the experts provide the best

central estimate which, in this case study, is equal to the

mean value of the threshold set by the experts ( 16y ) and

they are also asked to assess the boundaries of the interval in

which the true value of the threshold falls. A measure of the

uncertainty of the expert opinion is the standard deviation of

Y . By having the expert claiming that e.g. the true threshold

lays between the values 14 and 18 being most likely equal to

16, one can calculate the shape parameter and the scale

parameter of Eq. (4) from ( ) 16E Y and

2( ) 2Y Y which yields 64 and 4 .

This hazard zone distribution is shown as a red contour plot

in Figure 1. The skewness of a gamma distribution is

2 0.25 , a value which implies a good fit to the

expert’s claim.

Figures 4 and 5 show the VC trend and its estimation

provided by the power-law (Eq. 5) and z-shaped (Eq. 6)

models obtained at different operational days, namely nt

100, 200, 250 and 307. Because cumulative amounts of

deterioration are measured, the last inspection contains the

most information. For the gamma process the expected

deterioration at the last inspection time (at time nt ) equals

nx ; that is, n nE X t x (van Noortwijk, 2009). Figures

6 and 7 illustrate the remaining useful life and the associated

uncertainty (5th

and 95th

percentiles) obtained for each

239nt when using the power-law and z-shaped models,

respectively.

Notice that the power-law shape becomes convex only after

250 operational days (Fig. 4), thus leading to an

overestimation of the component remaining useful life (Fig.

6) with respect to its theoretical value (red dashed line). On

the other hand, using the z-shaped model at 240 operational

days one can already identify the final z-shape of the

degradation (Fig. 5), with the consequence of obtaining

better estimations of the remaining useful life, i.e. closer to

its theoretical value and with a reduced uncertainty (Fig. 7).

Figure 4. VC trend (thick line) and corresponding

estimations provided by the power-law model.


145


6

Figure 5. VC trend (thick line) and corresponding

estimations provided by the z-shaped model.

Figure 6. Remaining useful life estimation with the power-

law model (95% confident interval) and theoretical value

(red dashed line).

Figure 7. Remaining useful life estimation with the z-shaped

model (95% confident interval) and theoretical value (red

dashed line).

4. CONCLUSION

This paper has investigated the problem of estimating the

remaining useful life of components using stochastic

lifetime models and considering randomly-distributed

failure thresholds. In particular, gamma processes with

power-law and z-shaped shape functions (i.e. first concave,

then convex) have been proposed to predict the

deterioration; a gamma distribution has been considered to

model the failure threshold for it is frequently used as a

probability model in life testing and it is a flexible

distribution for modeling the uncertainty in experts’

opinions. The failure threshold distribution is also known to

contain only positive real values, i.e. 0,x .

A case study of erosion of choke valves used in offshore oil

platforms has been considered and the results of the

expected deterioration calculation and the remaining useful

life estimation given by the power-law and z-shaped models

have been compared. An a priori knowledge of the overall

shape of the deterioration is valuable. In this respect, with

some efforts the shape of the expected erosion can be

assumed beforehand by some engineering expertise.

However, the model is general and can be applied also to

other cases where the distribution of the parameters for a

maintenance model must be estimated.

REFERENCES

Abdel-Hameed, M. (1975). A gamma wear process. IEEE

Transactions on Reliability, 24(2), 152-153.

Andrews, J., Kjørholt, H., & Jøranson, H. (2005).

Production enhancement from sand management

philosophy: a case study from Statfjord and Gullfaks

(SPE 94511). SPE European Formation Damage

Conference, May 25-27, Sheveningen, The

Netherlands.

Bringedal, B., Hovda, K., Ujang, P., With, H.M., &

Kjørrefjord, G. (2010). Using online dynamic virtual

flow metering and sand erosion monitoring for integrity

management and production optimization. Deep

Offshore Technology Conference, May 3-6, Huston,

Texas.

Fantoni, P.F., & Nordlund, A. (2009). Wire system aging

assessment and condition monitoring (WASCO). NKS-

130. ISBN 87-7893-192-4

Frenk, J.B.G., & Nicolai, R.P. (2007). Approximating the

randomized hitting time distribution of a non-stationary

gamma process. Rotterdam: Econometric Institute and

Erasmus Research Institute of Management.

Gola, G., & Nystad, B.H. (2011a). From measurement

collection to remaining useful life estimation: defining

a diagnostic-prognostic frame for optimal maintenance

scheduling of choke valves undergoing erosion. Annual

Conference of the Prognostics and Health Management

Society, September 26-29, Montreal, Canada.

Gola, G., & Nystad, B.H. (2011b). Comparison of time- and

state-space non-stationary gamma processes for

estimating the remaining useful life of choke valves

undergoing erosion. 24th

International COMADEM

Conference, May 30 - June 1, Stavanger, Norway.

Kirmanen, J., Niemelä, I., Pyötsiä, J., Simula, M., Hauhia,

M., & Riihilahti, J. (2005). Flow control manual.

Helsinki: Metso Automation.


146


7

Lu, J.C., & Meeker, W.Q. (1993). Using degradation

measures to estimate a time-to-failure distribution.

Technometrics, 35(2), 161-173.

Mc Pehrson, J.W. (2010). Reliability physics and

engineering. London: Springer.

Nystad, B.H. (2008). Technical condition indexes and

remaining useful life of aggregated systems. Doctoral

dissertation. Norwegian University of Science and

Technology (NTNU), Trondheim, Norway. ISBN: 978-

82-471-1256-4

Rausand, M., & Høyland, A. (2004). System reliability

theory. Models, statistical methods, and applications.

New Jersey: Wiley & Sons.

van Noortwijk, J.M. (2009). A survey of the application of

gamma processes in maintenance. Reliability

Engineering and System Safety, 94(1), 2-21.

Welte, T.M., & Eggen, A.O. (2008). Estimation of sojourn

time distribution parameters based on expert opinion

and condition monitoring data. International

Conference on Probability Methods Applied to Power

Systems, May 25-29, Rincòn, Puerto Rico.

BIOGRAPHIES

Bent H. Nystad MSc in Cybernetics, RWTH Aachen,

Germany, and PhD in Marine Technology, NTNU

Trondheim, Norway. He has work experience as a condition

monitoring expert from Raufoss ASA (Norwegian missile

and ammunition producer) and is Principal Research

Scientist at the Institute for Energy Technology (IFE)

OECD Halden Reactor Project (HRP) since 1998. His

research interests have ranged from data-driven algorithms

and first principle models for prognostics, algorithm

performance evaluation, requirement specification, technical

health assessment and control applications.

Giulio Gola MSc in Nuclear Engineering, PhD in Nuclear

Engineering, Polytechnic of Milan, Italy. He is currently

working as a Research Scientist at the Institute for Energy

Technology (IFE) and OECD Halden Reactor Project (HRP)

within the Computerized Operations and Support Systems

department. His research topics deal with the development

of qualitative models and artificial intelligence-based

methods for on-line, large-scale signal validation, condition

monitoring and instrument calibration, system diagnostics

and prognostics.

John E. Hulsund is a graduate in Experimental Particle

Physics from the University of Bergen, Norway. He has

been working as a Research Scientist at the Institute for

Energy Technology (IFE) and OECD Halden Reactor

Project (HRP) since 1997 within the Computerized

Operations and Support Systems department. His main area

of work has been the development of a computerized

procedure tool for control room operations.


147

Major Challenges in Prognostics: Study on Benchmarking

Prognostics Datasets

O. F. Eker1, F. Camci

1, and I. K. Jennions

1

1 IVHM Centre, Cranfield University, UK

[email protected]

[email protected]

[email protected]

ABSTRACT

Even though prognostics has been defined to be one of the

most difficult tasks in Condition Based Maintenance

(CBM), many studies have reported promising results in

recent years. The nature of the prognostics problem is

different from diagnostics with its own challenges. There

exist two major approaches to prognostics: data-driven and

physics-based models. This paper aims to present the major

challenges in both of these approaches by examining a

number of published datasets for their suitability for

analysis. Data-driven methods require sufficient samples

that were run until failure whereas physics-based methods

need physics of failure progression.

1. INTRODUCTION

Condition based maintenance (CBM) is a preventive

maintenance strategy, in which maintenance tasks are

performed when need arises. The need is determined by

tracking the health status of the system or component

(Camci and Chinnam, 2010: Eker et al., 2011). CBM is a

proactive process involving two major tasks: diagnostics

and prognostics. Diagnostics is the process of identification

of faults, whereas prognostics is the process of forecasting

the time to failure. Time left before observing a failure is

described as remaining useful life (RUL) also called

remaining service or residual life (Jardine et al., 2006).

An example of degradation in health level of an asset is

shown in Figure 1. The P-F interval is the time interval

between potential failure which is identified by health

indicators, and an eventual functional failure. With CBM

it’s necessary that the P-F interval is long enough to enable

corrective maintenance action to be taken (Jennions, 2011).

Figure 1. P-F curve of an asset

Diagnostics is a more mature field than prognostics. Once

degradation is detected, unscheduled maintenance should be

performed to prevent the failure consequences. It is not

uncommon to spend more time in maintenance preparation

than in performing the actual maintenance due to lack of

resources. In prognostics on the other hand, maintenance

preparation could be performed when the system is up and

running, since the time to failure is known early enough.

Thus, the actual maintenance duration becomes the major

contributor of the downtime. Figure 2 illustrates the

comparison of diagnostics and prognostics.

Performing maintenance preparation when the system is up

and running has a great effect on reducing the operation and

support costs. In addition to the reduced down time, the

inventory cost will be reduced since more time will be

available for obtaining required parts. The efficiency in

logistics & supply chain will be increased due to the better

preparation for maintenance. The life cycle cost of the

equipment will be reduced, since they are used until end of

their lives.

_____________________

Omer Faruk Eker et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License,




148


2

Figure 2. Comparison of failure diagnostics and prognostics

maintenance scenarios

Despite the potential value in prognostics, it is considered to

be one of the most challenging tasks in CBM (Zhang et al.,

2006: Peng et al., 2010). Prognostics involves two phases as

shown in Figure 3. The first phase of prognostics aims to

assess the current health status. Severity detection, health

assessment, or degradation detection are the terms used for

describing this phase in the literature. This phase could also

be considered under diagnostics. Pattern recognition

techniques such as classification or clustering can be

utilized in this phase. The second phase aims to predict the

failure time by forecasting the degradation trend and by

identifying remaining useful life (RUL). Time series

analysis, trending, projection or tracking techniques are used

for this phase.

Figure 3. Phases of prognostics and diagnostics

Many academic papers with prognostics titles only consider

the first phase (Qiu et al., 2003; Ocak et al., 2007).

However, prognostics without the second phase will not be

complete and will not lead to RUL estimation. This paper

focuses on the second phase of prognostics.

Prognostics methods can be analyzed in two major

categories: Data-driven and physics-based models. Data-

driven models utilize past condition monitoring data, current

health status of the system, and degradation of similar

systems. Physics-based models employ system specific

mechanistic knowledge, defect growth formulas, and

condition monitoring data to predict the RUL of systems

(Heng et al., 2009).

This paper aims to discuss the challenges for data-driven

and physics-based prognostics and presents several case

studies. Section 2 reports the requirement analysis and

challenges of data-driven and physics-based prognostics

models. Section 3 discusses several prognostic case studies.

Finally section 4 concludes the paper with an emphasis on

future research tasks.

2. CHALLENGES IN PROGNOSTICS MODELING

Both data-driven and physics-based models have different

requirements to model the degradation and predict the RUL

of a system. Challenges and requirements of both

approaches are given in distinct sub-sections below.

2.1. Data-Driven Models

Data driven models intend to model system behavior using

regularly collected condition monitoring data instead of

using comprehensive system physics or human expertise

(Heng et al., 2009). Data-driven approaches are classified

into two categories in general. These are statistical and

machine learning approaches. Statistical approaches

construct models by fitting a probabilistic model to the

available data. Machine learning approaches attempt to

recognize complex patterns and make intelligent decisions

based on empirical data.

Both statistical and machine learning methods use the

degradation patterns of sufficient samples representing

equipment failure progression. This requirement is the

major challenge in data-driven prognostics since it is often

not possible to obtain samples of failure progressions.

Industrial systems are not allowed to run until failure due to

its consequences especially for critical systems and failure

modes. However quality and quantity (sample size) of

system monitoring data has a high influence on data-driven

methods. Sample sizes of prognostic datasets in the

literature range from 10 to 40 (Camci and Chinnam, 2010;

Baruah and Chinnam, 2005; Huang et al., 2007; Gebraeel et

al., 2005; Eker et al., 2011). In this paper datasets will be

compared to sample sizes provided in the references above

as quantitative analysis.

Most of the electro-mechanical failures occur slowly and

follows a degradation path (Gebraeel et al., 2009). Failure

degradation of such a system might take months or even

years. This challenge has been addressed in the literature in

the following ways:

1. Accelerated aging: Equipment is run in a lab environment

with extreme loads and/or increased speed to allow faster


149


3

failure. Structural health monitoring applications are a good

example of this type of failure progression. Test specimens

are subjected to cyclic loading experiments so that cracks

are propagated faster than normal degradation process

(Camci et al., 2012: Diamanti & Soutis, 2010: Papazian et

al., 2009). Camci and Chinnam, (2010) used imitations of

real components which are made by vulnerable materials so

that failure progresses faster than normal.

2. Unnatural failure progression: A predefined degradation

formula is used to define the discrete failure states and

duration to be spent in each state. Failure progression in a

railway turnout has been modeled using exponential

degradation (Eker O. F., 2011).

Each solution has its own strengths and weaknesses with

some level of failure degradation representation capability.

2.2. Physics-Based Models

Physics-based models employ a physical understanding of

the system in order to estimate the remaining useful life of

an asset. Even though samples of failure degradation are not

essential in physics based prognostics, the physical rules

within the system should be known in detail. The first phase

in physics based prognostics is to employ residuals that

represent the dispersion of sensed measurements from their

expected values of healthy systems (Namburu et al., 2003).

The second phase in physics based prognostics requires

mathematical modeling of failure degradation.

There exist two major challenges in physics based

prognostics: 1) the lack of sufficient knowledge on physics

of failure degradation and 2) the inability to obtain the

values of the parameters in the formulations. Thus,

sufficient component/system information and good

understanding of failure mechanisms are essential and

skilled personnel is also required in physics based models

(Zhang et al., 2009). Environmental and operating

conditions might be used as inputs and constitute added

dimensions to be considered.

3. BENCHMARKING DATASETS

Several publicly available datasets are analyzed in this

section for their suitability in testing prognostic approaches.

As mentioned in section 2, a prognostic dataset is expected

to have a minimum sample size around 10 in order to

perform data-driven modeling effectively. Regarding

physics-based prognostics side, datasets will be examined

with regards to: 1) If a mathematical degradation model

exists for the specific application and 2) whether parameters

in the model are provided with datasets or not. The

applicability of data driven and physics based prognostics

methods have been studied and results are presented in

following subsections.

3.1. NASA Data Repository (5 dataset)

NASA Ames prognostics data repository (2012) is a

growing source covering several sets of prognostic data

contributed by universities, companies, or agencies.

Datasets in the repository consist of run-to-failure time

series data representing the case study under examination.

There are seven sets of prognostics dataset available. In this

section analysis of five datasets for data-driven or physics-

based modeling is presented.

3.1.1. Milling Dataset

Sixteen milling inserts were degraded by running them at

different operating conditions (Agogino and Goebel, 2007).

Once the flank wear on the milling insert exceeded a

standard threshold level the tool was considered to have

failed. Flank wear was observed by a microscope on the

flank face of the cutting tool caused by the abrasion of hard

constituents of workpiece material which is commonly

observed during the machining of steels or cast irons.

Measurements of acoustic emission, vibration and current

were collected as indirect health indicators. There are eight

different operating conditions leading to only two samples

for each operating condition.

Effective data-driven modeling is very difficult, if not

impossible, using only two samples of failure degradation.

Several tool life or tool-wear rate models, mostly based on

Taylor’s formula (Yen et al., 2004), have been selected for

physics based prognostics and are displayed in Table 1.

Tool Life Models Tool Wear Rate Models

(1)

(2)

(3)

(4)

(5)

Table 1. Tool life and wear models


150


4

In physics-based prognostics side, Taylor tool life (Eq. 1)

and its extended versions in Eqs. 2-3, are well known life

models employed in machining applications. Each of them

can be applied into tool degradation scenarios separately.

Tool life is the duration in which a tool can be operated

properly before it starts to fail. In machining applications a

predetermined flank wear upper level is used as a failure

criterion. Tool life and rate of wear are sensitive to changes

in cutting conditions. The relationship between tool life and

machining parameters (e.g. cutting speed, feed, and depth of

cut) are described by these equations. Cutting speed is

considered as the difference in speed between the cutting

tool and the workpiece. Feed rate is the velocity of a tool

moving laterally across the workpiece which is

perpendicular to the cutting speed. The depth of cut is how

deep a workpiece is penetrated. Takeyama and Murata’s

tool wear rate model, shown in Eq. 5, describes the

relationship between rate of volume loss on the tool insert,

cutting distance and diffusive wear per cycle. Even though

parameters specific to tool material or workpiece (e.g.

cutting tool hardness) can be found in machining tool

handbooks, operating or environmental condition

parameters such as cutting temperature and sliding speed are

not provided with the dataset.

For the above reasons this dataset is found to be not suitable

for data-driven and physics-based prognostic models.

3.1.2. Bearing Dataset

Three sets (each set consist of four bearings) of tapered

rolling element bearings have been run to failure at the same

operating conditions (Lee et al., 2007). Accumulated mass

of debris was collected for each experiment, the amount of

debris being considered a direct health indicator of the

bearing health (Dempsey et al. 2006). In contrast to the

milling dataset, the direct health indicator (amount of debris

collected) was not provided with the dataset. Vibration data

was collected regularly as an indirect health indicator. After

exceeding 100 million revolutions the bearings were failed

due to a crack or outer race failure (Qiu et al., 2006).

Yu-Harris (Y-H) and Kotzalas-Harris (K-H) models were

selected to be used in a physics-based prognostic approach.

Both bearing spall initiation and spall progression models

found in (Orsagh et al., 2003: Yu, and Harris, 2001) are

shown in Table 2. Yu and Harris’ bearing stress-based spall

initiation formula is a function of dynamic capacity ( ) and

the applied load ( ) as shown in Eq. 6. Dynamic capacity is

also a function of bearing geometry and stress. Once

initiated, a spall grows very quickly and a bearing has only

3% to 20% of its remaining useful life left (Kotzala and

Harris, 2001). The Kotzala-Harris spall progression rate

model is a function of spall progression region width ( ),

and is described with regards to maximum stress ,

average shearing stress , and spall length .

Similar to the previous dataset some parameters to be used

in physics based modeling are not found in the dataset

(e.g. ).

Challenges emerge in this dataset are:

Three run-to-failure sets of samples are considered

insufficient for data-driven modeling when

compared to dataset sample sizes found in

literature.

Lack of parameters to be used in physics-based

modeling.

Spall initiation model

(6)

where:

(7)

(8)

Spall progression model

(9)

where:

(10)

,

Table 2. Bearing fatigue life models

3.1.3. Li-ion Battery Dataset

Electric unmanned aerial vehicle (eUAV) li-ion batteries

were used in this prognostic approach (Saha and Goebel,

2007). The batteries were charged and discharged at

different ambient temperatures and different load currents.

There are 4 samples under the same operating conditions


151


5

and in total 36 samples are provided. Battery capacity fade

is chosen as a failure indicator for these experiments. It was

assumed that 30% of battery capacity fade, for example a

reduction of 2000 to 1400 mAH was considered as failure.

Voltage, current and battery temperature measurements are

provided with the dataset as indirect health indicators.

Impedance and capacity measurements were given with the

dataset as damage criteria which are direct health indicators.

Only four set of batteries under the same operating and

environmental conditions are not enough to apply data-

driven prognostics in an effective way.

Typically battery capacity or end of life (EOL) modeling is

done for physics-based prognostics purposes. A remaining

battery capacity model can be found in the literature (Rong

and Pedram, 2006). All parameters, other than constant

coefficients which are determined from experimental testing

by curve fitting, are available to be employed in their model.

This dataset was therefore found to be eligible for physics-

based modeling.

3.1.4. Turbofan Engine Degradation Simulation Dataset

This dataset contains 4 sets of data each of which is a

combination of 2 failure modes and 2 operating conditions.

Each set has at least 200 engine degradation simulations

carried out using C-MAPSS which are divided into training

and test subsets (Saxena and Goebel, 2008). Twenty one

different sensor measurements as well as RUL values for

test subsets are given (Saxena et al., 2008). However, health

indicators were not provided with the dataset.

Degradation in the HPC and Fan of the turbofan engine is

simulated and dataset consists of multiple multivariate time

series data. The simulations employ several operating

conditions. The model that the dataset owners applied is

exponential degradation shown in Eq. 11 where ( ) is initial

degradation, ( ) is a scaling factor, ( ) time varying

exponent, and ( ) is upper wear threshold. The model is a

generalized equation of common damage propagation

models (e.g. Arrhenius, Coffin-Manson, and Eyring

models).

The dataset is eligible for data-driven approach since

sufficient data and RUL values are available with dataset.

Either statistical or machine learning data-driven models can

be employed to predict the RUL of turbofan engines. On the

other hand, it is not appropriate for physics based modeling

since the health index parameters are not given and no

physics-based model found for whole engine system

degradation.

3.1.5. IGBT Accelerated Aging Dataset

The dataset involves thermal overstress aging experiments

of Insulated Gate Bipolar Transistors (IGBTs). IGBTs are

power semiconductor devices used in switching applications

such as traction motor control, and switched-mode power

supplies (SMPS). Five IGBTs were aged with a squared

signal at gate and one was aged with DC waveforms

(Celaya et al., 2009). The experiments were stopped after

thermal runaway or latch-up failures were detected.

Collector current, gate voltage, collector-emitter voltage,

and package temperature measurements are given as indirect

health indicators.

There are five run-to-failure samples under the same

conditions. The dataset owners also declared that they

experienced several problems with aging systems

(Sonnenfeld et al., 2008). Thus, it is difficult to claim that

the dataset could be employed for data-driven prognostics

effectively.

The Coffin-Manson model (Eq. 12) is used as a physics-

based model for thermal cycling applications (Cui, 2005). It

is a function of temperature parameters and Arrhenius term

( ). Arrhenius term is evaluated when the maximum

temperature ( ) is reached in each cycle. Temperature

parameters to be used in the model are given with the

dataset. The dataset was therefore found to be eligible for

employing a physics-based approach.

Coffin-Manson Model

(12)

(13)

Table 3. Physics-based models for temperature cycling

3.2. Virkler Fatigue Crack Growth Dataset

Structural health monitoring (SHM) is the process of

implementing damage identification for typically civil,

aerospace or mechanical engineering infrastructure (Farrar

and Worden, 2007). In the SHM field, fatigue cracks are

defined as one of the primary structural damage

mechanisms caused by cyclic loadings. Cracks at the

structure surface grow gradually. Once a crack has reached

the critical length (determined by standards), the structure

will suddenly fracture and it may cause the system to fail

catastrophically. Therefore prediction of fatigue life or

fatigue crack growth in structures is necessary.


152


6

The Virkler fatigue crack growth dataset (Virkler et al.,

1979) contains 68 run-to-failure specimens. Specimens used

for experiments are center cracked sheets of 2024-T3

aluminum. Each specimen had a notch of 9mm initial crack

length and experiments were stopped once the crack lengths

reached about 50 mm. The crack length information is

provided as a direct health indicator of the specimens and is

given in the dataset. Each specimen has 164 crack length

observation points as shown in Figure 4. However indirect

sensory measurements such as vibration, acoustic emission

etc. is not provided.

Figure 4. Crack length propagation samples under the same

loading conditions

The Virkler dataset is eligible for data-driven and physics

based prognostics, since there are sufficient run-to-failure

samples and crack growth equations compared to prognostic

dataset sample sizes mentioned in section 2. Sixty eight

samples are sufficient to develop data-driven methods.

Crack growth formulation as shown in Eqs. 14 and 15 can

easily be used in physics based prognostics (Paris and

Erdogan, 1963: Cross et al., 2006). The Paris & Erdogan

crack growth rate ( ) formula consists of the material

specific constants ( ) and the range of intensity

factor ( ) where ( ) is range of cyclic stress amplitude,

( ) geometric constant, and ( ) is crack length.

Challenge and requirement analysis of six different dataset

has been performed both considering data-driven and

physics-based modeling demands. As a result, it’s found to

be 4 out of 6 datasets can be modeled employing a physics-

based approach easily while only two of them are applicable

for a data-driven prognostics approach.

A summary table of all datasets is shown in Table 4.

Compared to other datasets, the Virkler dataset was found to

be the most applicable considering the requirements of both

data-driven and physics-based approaches.

Dataset Data-Driven

Modeling

Physics-based

Modeling

Milling Dataset Hard Applicable

Bearing Dataset Hard Hard

Battery Dataset Hard Applicable

Engine Dataset Applicable Hard

IGBT Dataset Hard Applicable

Virkler Dataset Applicable Applicable

Table 4. Prognostic approach applicability table

4. CONCLUSION

Physics-based and data-driven models are two major

prognostic approaches which have been employed in several

case studies found in the literature. This paper attempts to

conduct requirement analysis for prognostic methods and

reports the challenges of applying the two major approaches

into different datasets. In general, physics-based models

require the presence of a mathematical representation of the

physics of failure degradation and the parameters used in

degradation modeling. Data-driven models require

statistically sufficient run-to-failure samples. Several

datasets were examined both considering physics-based and

data-driven approaches and eligibility of datasets are

summarized. The Virkler dataset was found to be the most

suitable with the data-driven and physics-based models. The

Virkler dataset has therefore been selected to be used in a

hybrid prognostic approach in the future.

ACKNOWLEDGEMENT

This research was supported by the IVHM Centre, Cranfield

University, UK and its industrial partners.

REFERENCES

Agogino, A., & Goebel, K. (2007). “Mill Data Set”, BEST

lab, UC Berkeley. NASA Ames

PrognosticsDataRepository,[http://ti.arc.nasa.gov/projec

t/prognostic-data-repository], NASA Ames, Moffett

Field, CA.

Baruah, P., & Chinnam, R. B. (2005). HMMs for

diagnostics and prognostics in machining processes.

International Journal of Production Research, 43(6),

1275-1293. doi:10.1080/00207540412331327727

Camci, F., & Chinnam, R. (2010). Health-state estimation

and prognostics in machining processes. IEEE


153


7

Transactions on Automation Science and Engineering,

7(3), 581-597. Retrieved from

http://dx.doi.org/10.1109/TASE.2009.2038170

Cross, R. J., Makeev, A., & Armanios, E. (2006). A

Comparison of Predictions From Probabilistic Crack

Growth Models Inferred From Virkler’s Data. Journal

of the ASTM International, 3(10), 1-11. Retrieved

February 21, 2012, from

http://soliton.ae.gatech.edu/people/andrew.makeev/AST

M

Cui, H. H. (2005). Accelerated Temperature Cycle Test and

Coffin-Manson Model for Electronic Packaging.

Proceedings of the Annual Reliability and

Maintainability Symposium, 556-560.

Dempsey, P. J., Kreider, G., & Fichter, T. (2006).

Investigation of tapered roller bearing damage detection

using oil debris analysis. Aerospace Conference, 2006

IEEE (p. 11–pp).doi: 10.1109/AERO.2006.1656082

Diamanti, K., & Soutis, C. (2010). Structural health

monitoring techniques for aircraft composite structures.

Progress in Aerospace Sciences, 46(8), 342-352.

Elsevier. doi:10.1016/j.paerosci.2010.05.001

Eker O. F., (2011). Development of a New State Based

Prognostics Method. MSc Thesis. Graduate Institute of

Sciences and Engineering, Fatih University, Istanbul,

Turkey.

Eker, O. F., Camci, F., Guclu, A., Yilboga, H., Sevkli, M.,

& Baskan, S. (2011). A Simple State-Based Prognostic

Model for Railway Turnout Systems. Industrial

Electronics, IEEE Transactions on, 58(5), 1718–1726.

F. Camci, K. Medjaher, N. Zerhounib & P. Nectoux,

Feature Evaluation for Effective Bearing Prognostics,

Quality Reliability Engineering International, DOI:

10.1002/qre.1396

Farrar, C. R., & Worden, K. (2007). An introduction to

structural health monitoring. Philosophical

transactions. Series A, Mathematical, physical, and

engineering sciences, 365(1851), 303-15.

doi:10.1098/rsta.2006.1928 Gebraeel, N., Elwany, A., & Jing, P. (2009). Residual Life

Predictions in the Absence of Prior Degradation

Knowledge. IEEE Transactions On Reliability, 58(1),

106-117. doi:10.1109/TR.2008.2011659

Gebraeel, N. Z., Lawley, M. a., Li, R., & Ryan, J. K. (2005).

Residual-life distributions from component degradation

signals: A Bayesian approach. IIE Transactions, 37(6),

543-557. doi:10.1080/07408170590929018

Hai Qiu, Jay Lee, Jing Lin. (2006). Wavelet Filter-based

Weak Signature Detection Method and its Application

on Roller Bearing Prognostics. Journal of Sound and

Vibration, 289, 1066-1090.

doi:10.1016/j.jsv.2005.03.007

Heng, a, Zhang, S., Tan, A., & Mathew, J. (2009). Rotating

machinery prognostics: State of the art, challenges and

opportunities. Mechanical Systems and Signal

Processing, 23(3), 724-739.

Huang, R., Xi, L., Li, X., Richard Liu, C., Qiu, H., & Lee, J.

(2007). Residual life predictions for ball bearings based

on self-organizing map and back propagation neural

network methods. Mechanical Systems and Signal

Processing, 21(1), 193-207.

doi:10.1016/j.ymssp.2005.11.008

J. Celaya, Phil Wysocki, & K. Goebel (2009), "IGBT

accelerated aging data set", NASA Ames Prognostics

Data Repository, [/tech/dash/pcoe/prognostic-data-

repository/], NASA Ames, Moffett Field, CA.

J. Lee, H. Qiu, G. Yu, J. Lin, & Rexnord Technical Services

(2007).'Bearing Data Set', IMS, University of

Cincinnati. NASA Ames Prognostics Data Repository,

[http://ti.arc.nasa.gov/project/prognostic-data-

repository], NASA Ames, Moffett Field, CA

Andrew K.S. Jardine, Daming, L., & Dragan, B. (2006).

Review: A review on machinery diagnostics and

prognostics implementing condition-based

maintenance. Mechanical Systems and Signal

Processing, 201483-1510.

doi:10.1016/j.ymssp.2005.09.012

Jennions I. K. (2011). Integrated vehicle health

management: Perspectives on an emerging field.

Pennsylvania, USA: SAE International.

Kotzalas, M., & Harris, T. (n.d). Fatigue failure progression

in ball bearings. Journal of Tribology-Transactions of

The Asme, 123(2), 238-242.

Namburu, M., Pattipati, K., Kawamoto, M., & Chigusa, S.

(2003). Model-based prognostic techniques

[maintenance applications]. Proceedings

AUTOTESTCON 2003. IEEE Systems Readiness

Technology Conference., 330-340.

NASA Ames Prognostics data repository, retrieved Oct.

2011, from:

http://ti.arc.nasa.gov/tech/dash/pcoe/prognostic-data-

repository/

Ocak, H., Loparo, K., & Discenzo, F. (2007). Online

tracking of bearing wear using wavelet packet

decomposition and probabilistic modeling: A method

for bearing prognostics. Journal of Sound and

Vibration, 302(4-5), 951-961.

doi:10.1016/j.jsv.2007.01.001

Orsagh, R. F., Sheldon, J. & Klenke, C. J. (2003),

"Prognostics/diagnostics for gas turbine engine

bearings", ASME Turbo Expo, Vol. 1, 16-19 June 2003,

Atlanta, GA, pp. 159.

Papazian, J. M., Anagnostou, E. L., Engel, S. J., Hoitsma,

D., Madsen, J., Silberstein, R. P., Welsh, G. (2009). A

structural integrity prognosis system. Engineering

Fracture Mechanics, 76(5), 620-632.

doi:10.1016/j.engfracmech.2008.09.007

Paris, P.C. & Erdogan, F. (1963). A critical analysis of

crack propagation laws. Journal of Basic Engineering,

Trans. ASME, Ser. D, 85, 528–534.

Peng, Y., Dong, M., & Zuo, M. J. (2010). Current status of

machine prognostics in condition-based maintenance: a


154


8

review. The International Journal of Advanced

Manufacturing Technology, 50(1-4), 297-313.

Qiu, H., Lee, J., Lin, J., & Yu, G. (2003). Robust

performance degradation assessment methods for

enhanced rolling element bearing prognostics.

Advanced Engineering Informatics, 17(3-4), 127-140.

doi:10.1016/j.aei.2004.08.001

Rong, P., & Pedram, M. (2006). An Analytical Model for

Predicting the Remaining Battery Capacity of Lithium-

Ion Batteries. Integration The Vlsi Journal, 14(5), 441-

451.

Saha, B., & Goebel K. (2007). “Battery Data Set”, NASA

Ames Prognostics Data Repository,



Saxena, A., & Goebel K. (2008). “C-MAPSS Data Set”,

NASA Ames Prognostics Data Repository,



Saxena, A., Goebel, K., Simon, D., & Eklund, N. (2008).

Damage propagation modeling for aircraft engine run-

to-failure simulation. Prognostics and Health

Management, 2008. PHM 2008. International

Conference on (pp. 1–9). IEEE.

Sonnenfeld, G., Goebel, K., Cleaya, J., (2008). An agile

accelerated aging, characterization and scenario

simulation system for gate controlled power transistors.

AUTOTESTCON, 2008.

Virkler, D. A., Hillberry, B. M., & Goel, P. K. (1979). The

Statistical Nature of Fatigue Crack Propagation.

Journal of Engineering Materials and Technology,

101(2), pp. 148–153.

Yen, Y., Sohner, J., Lilly, B., Altan, T., (2004). Estimation

of tool wear in orthogonal cutting using the finite

element analysis. Journal of Materials Processing

Technology, 146(1), 82-91.

Yu, K. W., Harris, T. A. New Stress-Based Fatigue Life

Model for Ball Bearings. (2001) Tribology

Transactions, Vol. 44(1), pp. 11-18.

Zhang, H., Kang, R., & Pecht, M. (2009). A hybrid

prognostics and health management approach for

condition-based maintenance. 2009 IEEE International

Conference on Industrial Engineering and Engineering

Management, Dec 8-11, Hong Kong, pp. 1165-1169.

doi:10.1109/IEEM.2009.5372976

Zhang, L. L., Li, X. X., & Yu, J. J. (2006). A review of fault

prognostics in condition based maintenance [6357-182].

Proceedings- Spie The International Society For

Optical Engineering, 6357635752.

BIOGRAPHIES

Omer Faruk Eker is a PhD student in

School of Applied Sciences and works

as researcher at IVHM Centre, Cranfield

University, UK. He received his B.Sc.

degree in Mathematics from Marmara University and M.Sc.

in Computer Engineering from Fatih University, Istanbul,

Turkey. He got involved in a project funded by TUBITAK,

and Turkish State Railways. His research interests include

failure diagnostics and prognostics, condition based

maintenance, pattern recognition and data mining.

Dr. Fatih Camci works as a faculty at

IVHM Centre at Cranfield University

since 2010. He has worked on many

research projects related to Prognostics

Health Management (PHM) in USA,

Turkey, and UK. His PhD work was

supported by National Science Foundation

in USA and Ford Motor Company on development of

novelty detection, diagnostics, and prognostics methods. He

worked as senior researcher at Impact Technologies, world-

leading SME on PHM, for two years. He has involved in

many projects funded by US Navy and US Air Force

Research Lab. These projects involve development of

maintenance planning and logistics with PHM. He then

worked as Asst. Prof. in Turkey. He has led a research

project, funded by TUBITAK (The Scientific and

Technological Research Council of Turkey) and Turkish

State Railways, on development of prognostics and

maintenance planning systems on railway switches. In

addition to PHM, his research interest involves decision

support systems and energy.

Ian Jennions Ian’s career spans over 30

years, working mostly for a variety of gas

turbine companies. He has a Mechanical

Engineering degree and a PhD in CFD both

from Imperial College, London. He has

worked for Rolls-Royce (twice), General

Electric and Alstom in a number of

technical roles, gaining experience in aerodynamics, heat

transfer, fluid systems, mechanical design, combustion,

services and IVHM. He moved to Cranfield in July 2008 as

Professor and Director of the newly formed IVHM Centre.

The Centre is funded by a number of industrial companies,

including Boeing, BAe Systems, Rolls-Royce, Thales,

Meggitt, MOD and Alstom Transport. He has led the

development and growth of the Centre, in research and

education, over the last three years. The Centre offers a

short course in IVHM and the world’s first IVHM MSc,

begun in 2011.

Ian is on the editorial Board for the International Journal of

Condition Monitoring, a Director of the PHM Society,

contributing member of the SAE IVHM Steering Group and

HM-1 IVHM committee, a Fellow of IMechE, RAeS and

ASME. He is the editor of the recent SAE book: IVHM –

Perspectives on an Emerging Field.


155

Physics Based Electrolytic Capacitor Degradation Models forPrognostic Studies under Thermal Overstress

Chetan S. Kulkarni1, Jose R. Celaya2, Kai Goebel3, and Gautam Biswas4

1,4 Vanderbilt University, Nashville, TN, 37235, [email protected]@vanderbilt.edu

2 SGT Inc. NASA Ames Research Center, Moffett Field, CA, 94035, [email protected]

3 NASA Ames Research Center, Moffett Field, CA, 94035, [email protected]

ABSTRACT

Electrolytic capacitors are used in several applications rang-ing from power supplies on safety critical avionics equipmentto power drivers for electro-mechanical actuators. This makesthem good candidates for prognostics and health managementresearch. Prognostics provides a way to assess remaining use-ful life of components or systems based on their current stateof health and their anticipated future use and operational con-ditions. Past experiences show that capacitors tend to degradeand fail faster under high electrical and thermal stress condi-tions that they are often subjected to during operations. Inthis work, we study the effects of accelerated aging due tothermal stress on different sets of capacitors under differentconditions. Our focus is on deriving first principles degra-dation models for thermal stress conditions. Data collectedfrom simultaneous experiments are used to validate the de-sired models. Our overall goal is to derive accurate models ofcapacitor degradation, and use them to predict performancechanges in DC-DC converters.

1. INTRODUCTION

Most devices and systems today contain embedded electronicmodules for monitoring, control and enhanced functionality.In spite of the electronic modules being used to enhance sys-tem performance and capabilities, these modules are often thefirst elements in the system to fail (Saha et al., 2009; Goebelet al., 2008; Saxena et al., 2008). These failures can be at-tributed to adverse operating conditions, such as high tem-peratures, voltage surges and current spikes. Studying and

Chetan S. Kulkarni et.al. This is an open-access article distributed under theterms of the Creative Commons Attribution 3.0 United States License, whichpermits unrestricted use, distribution, and reproduction in any medium, pro-vided the original author and source are credited.

analyzing the degradation of these systems (i.e.,degradationin performance) provides data that can be used to meet criti-cal goals like advance failure warnings; (Goebel et al., 2008;Saxena et al., 2008), unscheduled maintenance;(Saha et al.,2009) which play an important role in aviation safety.

The term “diagnostics” relates to the ability to detect andisolate faults or failures in a system. “Prognostics” on theother hand is the process of predicting health condition andremaining useful life based on current state and previousconditions. Prognostics and health management (PHM) isa method that permits the assessment of the reliability of asystem under its actual application conditions. PHM meth-ods combine sensing, data collection, interpretation of envi-ronmental, operational, and performance related parametersto indicate systems health. PHM methodologies can be im-plemented through the use of various techniques that studyparameter variations, which indicate changes in parameterdegradation and operation performace based on variations ina life-cycle profile.

Prognostics and Health Management (PHM) methodologieshave emerged as one of the key enablers for achieving ef-ficient system level maintenance and safety in military sys-tems (Saha et al., 2009). Prognostics and health managementfor electronic systems aims to detect, isolate, and predict theonset and source of system degradation as well as the timeto system failure. The goal is to make intelligent decisionsabout the system health and to arrive at strategic and businesscase decisions. As electronics become increasingly complex,performing PHM efficiently and cost-effectively is becomingmore demanding (Saha et al., 2009; J. R. Celaya et al., 2010).

In the aerospace domain, flight and ground staff need to ac-quire information regarding the current health state for all

1


156


subsystems of the aircraft, such as structures, propulsion,control, guidance and navigation systems on a regular basisto maintain safe operation. This has given rise to researchprojects that focus on accurate diagnosis of faults, developingprecursors to failure, and predicting remaining componentlife (Balaban et al., 2010; J. R. Celaya et al., 2010). Most theavionics systems and subsystems in todays modern aircraftscontain significant electronics components which perform acritical role in on-board, autonomous functions for vehiclecontrols, communications, navigation and radar systems. Fu-ture aircraft systems will rely on more electric and electroniccomponents. Therefore, this may also increase the rate ofelectronics related faults that occur in these systems with per-haps unanticipated fault modes that will be hard to detect andisolate. It is very important to provide system health aware-ness for digital electronics systems on-board, to improve air-craft reliability, assure in-flight performance, and reducingmaintenance cost. An understanding of how components de-grade is needed as well as the capability to anticipate fail-ures and predict the remaining useful life of electronic com-ponents (Balaban et al., 2010; Saha et al., 2009).

1.1. Related Work

The output filter capacitor has been identified as one of theelements of a switched mode power supply that fails morefrequently, and therefore, has a critical impact on perfor-mance (Vohnout et al., 2008; Goodman et al., 2007; Orsagh& et’al, 2006). A prognostics and health management (PHM)approach for power supplies of avionics systems is presentedin (Orsagh & et’al, 2006).

A health management approach for multilayer ceramic ca-pacitors is presented in the work by (Nie et al., 2007). Thisapproach focuses on the temperature-humidity bias acceler-ated test to replicate failures. This approach to fault detectionuses data trending algorithms in conjunction with multivariatedecision-making. The Mahalanobis distance (MD) is used todetect abnormalities within the data and classify the data into“normal” and “abnormal” groups. The abnormal data are thenfurther classified into severity levels of abnormality based onwhich predictions of RUL are made.

In the study done by (Gu et al., 2008), 96 multi-layer ceramiccapacitors (MLCC) were selected for in-situ monitoring andlife testing under elevated temperature (85C) and humidity(85% RH) conditions with one of 3 DC voltage bias levels:rated voltage (50 V), low voltage (1.5 V), and no voltage (0V). This method uses data from accelerated aging tests to de-tect potential failures and to make an estimation of time offailure. A data driven fault detection algorithm for multilayerceramic capacitor failures is presented in (Gu & Pecht, 2008).The approach used in this study combines regression anal-ysis, residual, detection and prediction analysis (RRDP). Amethod based on Mahalanobis distance is used to detect ab-

normalities in the test data; there is no prediction of RUL.

In the work done by (Wereszczak et al., 1998) the failureprobability of the Barium Titanate used for the manufactur-ing of MLCC’s was studied. Dielectric ceramics in multi-layer capacitors are subjected to thermo-mechanical stresses,which, may cause mechanical failure and lead to loss of elec-trical function. Probabilistic life design or failure probabilityanalysis of a ceramic component combines the strength dis-tribution of the monolithic ceramic material comprising thecomponent, finite element analysis of the component underthe mechanical loading conditions of interest, and a multiax-ial fracture criterion.

The work by (Buiatti et al., 2010) looked at the degradationin metalized polypropylene film (MPPF) capacitors, where anoninvasive technique for capacitor diagnostics in Boost con-verters is studied. This technique is based on the double es-timations of the ESR and the capacitance, improving the di-agnostic reliability and allowing for predictive maintenanceusing a low-cost digital signal processor (DSP).

We adopt a physics based modeling (PBM) approach to pre-dict the dynamic behavior of the system under nominal anddegraded conditions. Faults and degradations appear as pa-rameter value changes in the model, and this provides themechanisms for tracking system behavior under degradedconditions (Kulkarni et al., 2009).

In DC-DC power supplies used as subsytem in avionics sys-tems (Bharadwaj et al., 2010; Kulkarni et al., 2009), elec-trolytic capacitors and MOSFET switches are known to havethe highest degradation and failure rates among all of thecomponents (Goodman et al., 2007; Kulkarni et al., 2009).Degraded electrolytic capacitors affect the performance andefficiency of the DC-DC converters in a significant way. Weimplement the PHM methodology to predict degradation inelectrolytic capacitors combining the physics of failure mod-els with data collected from experiments on the capacitorsunder different simulated operating conditions. In (Kulkarni,Biswas, et al., 2011b; Kulkarni, Celaya, et al., 2011) we dis-cuss about degradation related to thermal overtress conditionsand qualitative degradation mechanims. In this paper we dis-cuss the derived physics based modeling and degradation re-lated to thermal overstress condition (TOS) along with theexperiments conducted.

2. ELECTROLYTIC CAPACITORS

Electrolytic capacitor performance is strongly affected by itsoperating conditions, such as voltage, current, frequency, andambient temperatures. When capacitors are used in powersupplies and signal filters, degradation in the capacitors in-creases the impedance path for the AC current and decreasein capacitance introduces ripple voltage on top of the desiredDC voltage. Continued degradation of the capacitor leads the

2


157


converter output voltage to drop below specifications affect-ing downstream components. In some cases, the combinedeffects of the voltage drop and the ripples may damage theconverter and downstream components leading to cascadingfailures in systems and subsystems.

A primary reason for wear out in aluminum electrolytic ca-pacitors is due to vaporization of electrolyte (Goodman etal., 2007) and degradation of electrolyte due to ion exchangeduring charging/discharging (Gomez-Aleixandre et al., 1986;Ikonopisov, 1977) , which, in turn leads to a drift in the twomain electrical parameters of the capacitor: (1) the equivalentseries resistance (ESR), and (2) the capacitance (C). TheESR of a capacitor is the sum of the resistance due to alu-minum oxide, electrolyte, spacer, and electrodes (foil, tab-bing, leads, and ohmic contacts) (Hayatee, 1975; Gasperi,1996). The health of a capacitor is often indicated by thevalues of these two parameters. There are certain industrystandard thresholds for these parameter values, upon crossingthese threshold barrier the component is considered unhealthyto be used in a system, i.e., the component has reached its endof life, and should be immediately replaced before further op-erations (Lahyani et al., 1998; Eliasson, 2007; Imam et al.,2005).

As illustrated in Fig. 1 an aluminum electrolytic capacitor,consists of a cathode aluminum foil, electrolytic paper, elec-trolyte, and an aluminum oxide layer on the anode foil sur-face, which acts as the dielectric. When in contact with theelectrolyte, the oxide layer possesses an excellent forward di-rection insulation property (Gasperi, 1996). Together withmagnified effective surface area attained by etching the foil,a high capacitance value is obtained in a small volume (Fife,2006). Since the oxide layer has rectifying properties, a ca-

Anode Foil

Cathode Foil

Connecting Lead

Aluminum Tab

Separator Paper

Figure 1. Physical Model of Electrolytic Capacitor

pacitor has polarity. If both the anode and cathode foils havean oxide layer, the capacitors would be bipolar. In this work,we analyze “non-solid” aluminum electrolytic capacitors inwhich the electrolytic paper is impregnated with liquid elec-trolyte. The another type of aluminum electrolytic capacitor,that uses solid electrolyte (Bengt, 1995) is not discussed in

this work.

2.1. Overview of Degradation Mechanisms

The flow of current during the charge/ discharge cycle of thecapacitor causes the internal temperature to rise. The heatgenerated is transmitted from the core to the surface of thecapacitor body, but not all the heat generated can escape. Theexcess heat results in a rise in the internal temperature ofthe capacitors which causes the electrolyte to evaporate, andgradually deplete (Kulkarni, Biswas, et al., 2011b; Kulkarni,Celaya, et al., 2011). Similarly in situations where the capac-itor is operating under high temperature conditions, the ca-pacitor body is at a higher temperature than its core, the heattravels in the opposite directions from the body surface to thecore of the capacitor again increasing the internal temperaturecausing the electrolyte to evaporate. This is explained usinga first principles thermal model of heat conduction (Kulkarni,Biswas, et al., 2011b; Kulkarni, Celaya, et al., 2011).

Degradation in the oxide layer can be attributed to crystal de-fects that occur because of the periodic heating and coolingduring the capacitor’s duty cycle, as well as stress, cracks, andinstallation-related damage. High electrical stress is known toaccentuate the degradation of the oxide layer due to localizeddielectric breakdowns on the oxide layer (Ikonopisov, 1977;Wit & Crevecoeur, 1974). These breakdowns, which accel-erate the degradation, have been attributed to the duty cycle,i.e., the charge/discharge cycle during operation (Ikonopisov,1977). Further another simultaneous phenomenon is theincrease in the internal pressure (Gomez-Aleixandre et al.,1986) due to an increased rate of chemical reactions, whichcan again be attributed to the internal temperature increase inthe capacitor. This pressure increase can ultimately lead tothe capacitor popping.

All the failure/degradation phenomenon mentioned may actsimultaneously based on the operating conditions of the ca-pacitors. We first study the phenomenon qualitatively, andthen discuss the steps to derive the first principles analyticdegradation models for the different thermal stress condition.Electrolyte evaporations is caused either due to increase in in-ternal core temperature or external surrounding temperature.Both phenomenon lead to the same degradation mode,causedeither by the high electrical stress or thermal stress, respec-tively.

3. THERMAL OVERSTRESS EXPERIMENT

In this setup we emulated conditions similar to high tem-perature storage conditions (Kulkarni, Biswas, et al., 2011b;Kulkarni, Celaya, et al., 2011), where capacitors are placedin a controlled chamber and the temperature is raised abovetheir rated specification (60068-1, 1988). Pristine capacitorswere taken from the same lot rated for 10V and maximumstorage temperature of 85C.

3


158


The chamber temperature was gradually increased in steps of25C till the pre-determined temperature limit was reached.The capacitors were allowed to settle at a set temperature for15 min and then the next step increase was applied. Thisprocess was continued till the required temperature limit wasattained. To decrease possibility of shocks due to sudden de-crease in the temperature the above procedure was followed.

Experiments done with 2200 µF capacitors with TOS tem-perature at 105C and humidity factor at 3.4%. At the end ofspecific time interval the temperature was lowered in steps of25C till the required room temperature was reached. Beforebeing characterized the capacitors were kept at room temper-ature for 15 min. The ESR value is the real impedance mea-sured through the terminal software of the instrument. Simi-larly the capacitance value is computed from the imaginaryimpedance using Electrochemical Impedance Spectroscopy(EIS). Characterization of all the capacitors was done formeasuring the impedance values using an SP-150 Biologicimpedance measurement instrument (Biologic, 2010).

0 1000 2000 30001600

1650

1700

1750

1800

1850

1900

1950

Time (hrs)

Cap

acit

ance

(u

F)

Capacitance vs. Time for 2200uF @ 105C

Cap1Cap2Cap3Cap4Cap5Cap6Cap7Cap8Cap9Cap10Cap11Cap12Cap13Cap14Cap15

Figure 2. Capacitance Plot for all the devices under TOS

4. PHYSICS BASED MODELING OF CAPACITOR DEGRA-DATION

Based on the above discussions on degradation and experi-ments conducted, in this section we discuss about derivingthe first principles models for thermal overstress conditions.Under thermal overstress conditions since the device was sub-jected to only high temperature with no charge applied weobserve degradation only due to electrolyte evaporation. Themodels are derived based on this observations and measure-ments see during from the experimental data.

For deriving the physics based models it is also very muchnecessary to know about the structural details of the compo-nent under study. The models defined use this information

for making effective degradation/failure predictions. A de-tail structural study of the electrolytic capacitor under test isdiscussed in this section.

During modeling it is not possible to know the exact amountof electrolyte present in a capacitor. But using informationfrom the structure details we can approximately calculate theamount of electrolyte present. Based on the type and configu-ration, the electrolyte volume will vary which can be updatedin the model parameters. The equation for calculating the ap-proximate electrolyte volume is derived from calculating thevolume of the total capacitor capsule, is given by :

Vc = πr2chc (1)

The amount of electrolyte present depends on the type of pa-per used as a separator between the anode and cathode foils.A highly porous paper type is used in the construction of thecapacitor such that maximum amount of electrolyte can besoaked in the paper. The electrolyte is completely soaked inthe paper spacer. Hence the electrolyte volume can be ap-proximated as :

Ve ≈ Vpaper (2)

The approximate volume of electrolyte, Ve based on ge-ometry of the capacitor is expressed in terms of followingequation:

Ve = πr2chc −Asurface(dA + dC) (3)

A simplified electrical lumped parameter model ofimpedance, M1 defined for a electrolytic capacitor isas shown in Fig.3. The ESR dissipates some of the storedenergy in the capacitor. In spite of the dielectric insulationlayer between a capacitor’s plates, a small amount of ‘leak-age’ current flows between the plates. For a good capacitoroperating nominally this current is not significant, but itbecomes larger as the oxide layer degrades during operation.An ideal capacitor would offer no resistance to the flow ofcurrent at its leads. However, the electrolyte , aluminumoxide , space between the plates and the electrodes combinedproduces a small equivalent internal series resistance.

C1R1

ESR

C

1mΩ

R3 ≥ 10K R4 ≥ 10K

R2RE C2Anode foil

electrode

resistance

Cathode foil

electrode

resistance

Electrolyte

resistance

R1

2 mΩ

R2

RE

1mΩ

C1R1

2 mΩ

RE

Coxide_layer

C1

Figure 3. Lumped Parameter Model (M1 )

From the literature (Rusdi et al., 2005; Bengt, 1995; Roeder-stein, 2007) and experiments conducted under thermal over-stress, it has been observed that the capacitance and ESRvalue depends of the electrolyte resistance RE . A more de-tailed lumped parameter model derived for an electrolytic ca-pacitor under thermal overstress condition,M2 can be modi-fied fromM1, as shown in Fig. 4. R1 is the combined series

4


159


and parallel resistances in the model. RE is the electrolyteresistance. The combined resistance of R1 and RE is theequivalent series resistance of the capacitor. C is the totalcapacitance of the capacitor as discussed earlier.

R1 RE C

ESR

Figure 4. Updated Lumped Parameter Model (M2 )

4.1. First Principle Models

The input impedance of the capacitor network is defined interms of the total lumped series and parallel impedance ofthe simplified network. The total lumped capacitance of thestructure is given by

C = (2εRε0Asurface)/dC (4)

From the literature study (Rusdi et al., 2005; Bengt, 1995) formodeling ESR degradation it was observed that electrolyteresistance (RE) parameter, as discussed above forms a majorpart of combined ESR as shown in Fig. 3. Thus being adominant parameter any changes in RE lead to changes inthe ESR value. We studied the relationship between RE andAsurface, which gives us a degradation model for ESR. Theequation for RE is given by :

RE = ρE dC PE/(2 ∗ L ∗H) (5)

Since RE is a dominant parameter in ESR and any changesin RE affect ESR value, from Eq. (5) we express ESR interms of the oxide surface area, Asurface as:

ESR = ρE dC PE/(2 ∗Asurface) (6)

Exposure of the capacitors to temperatures Tapplied > Tratedresults in accelerated aging of the devices (Kulkarni, Celaya,et al., 2011; Kulkarni, Biswas, et al., 2011a; 60068-1, 1988).Higher ambient storage temperature accelerates the rate ofelectrolyte evaporation leading to degradation of the capaci-tance (Kulkarni, Celaya, et al., 2011; Bengt, 1995). The de-pletion in the volume and thus the effective surface area isgiven by Eq. (7).

V = Vo − (Asurface jeo we)× t (7)

Details of the derivation of this equation can be foundin (Kulkarni, Biswas, et al., 2011b; Rusdi et al., 2005). Evap-oration also leads to increase in the internal pressure of the ca-pacitor, which decreases electrolyte evaporation rate. Eq. (9)and Eq. (8) give us the decrease in the active surface area dueto evaporation of the electrolyte, which results in a decrease inC and an increase in ESR, respectively (Bengt, 1995; Roed-erstein, 2007).

4.1.1. Capacitance Degradation Model

Thus from Eq. (4) and (7) we can derive the first principlescapacitance degradation model, D1 which is given by :

D1 : C(t) =

[2εRε0dC

] [V0 − V (t)

jeo t we

](8)

The degradation in capacitance is directly proportional to thedamage parameter V . As discussed earlier, increase in thecore temperature evaporates the electrolyte thus decreasingthe electrolyte volume leading to degradation in capacitance.The resultant decrease in the capacitance can be computedusing Eq. (8).

4.1.2. ESR Degradation Model

From Eq. (6) and Eq. (7) the ESR degradation model, D2 isgiven as :

D2 : ESR(t) =

[ρE dC PE

2

] [jeo we t

V0 − V (t)

](9)

In this model there are two parameters which change withtime, rate of evaporation jeo and the correlation factor relatedto electrolyte spacer porosity and average liquid pathway, PE .As the electrolyte evaporates due to high temperature the cor-relation factor PE will increase as the average pathway of theliquid decreases. Electrolyte evaporation under thermal stressstorage condition results due to the increase in the high atmo-spheric temperature. Under this operating condition when thesurrounding temperature gets high, the temperature of the ca-pacitor capsule also increases. Heat travels from the surfaceof the body to the core of the temperature, this phenomenonis described through the thermal model (Kulkarni et al., 2009;Kulkarni, Celaya, et al., 2011).

The decrease in the capacitance parameter value is as a resultof the decrease in electrolyte volume due to evaporation underthermal overstress condition. This relationship between de-crease in capacitance with electrolyte volume is explained byEq. (8). Similarly increase inESR is given by the increase inthe electrolyte resistance (RE) as explained by Eq. (9). Withdecrease in the electrolyte due to thermal overstress, the aver-age liquid path length is reduced, which increases PE . Undernormal circumstances when the capacitors are stored at roomtemperature or below rated temperatures, no damage or de-crease in the life expectancy is observed. But in cases wherethe capacitors are stored under thermal stress conditions per-manent damage is observed.

In the thermal overstress experiments, the capacitors we char-acterized periodically and after 3400 hours of operation it wasobserved that the average capacitance (C) value decreased bymore than 8-9% while increase in the ESR value was ob-served around 20 - 22%. From literature (60068-1, 1988)

5


160


under thermal overstress conditons higher capaitance degra-dation is observed and minor degradation in ESR which cor-related with the data collected. The failure thresholds understorage conditions for capacitance (C) is 10% while that forESR is around 280- 300% of the pristene condition values(60384-4-1, 2007; Kulkarni, Biswas, et al., 2011b). Based onthe degradation observed from the experiments capacitancedegradation was considered as a precursor to failure to esti-mate the current health condition of the device.

5. DEGRADATION MODELING

In our earlier work (J. Celaya et al., 2011b, 2011a; Kulkarniet al., 2012) an implementation of a model-based prognosticsalgorithm based on Kalman filter and a physics inspired em-pirical degradation model has been presented. The physicsinspired degradation model was derived based on the capaci-tance degradation data from electrical overstress experiments.This model relates aging time to the percentage loss in capac-itance and has the following form,

C(t) = eαt + β, (10)

where model constants α and β were estimated from the ex-perimental data. Here the exponential model was linked to thedegradation data and parameters were derived based on thisdata. The exponential empirical model derived in Eq. (10)was further updated and as discussed in section (4.1.1) wedeveloped a first principles based generalized model to be im-plemented for different capacitor types and operating condi-tions. In this work we looked into the degradation data underthermal overstress as discussed earlier and use Eq.( 8) to buildthe physics based model based on following equation:

Here in this section we will discuss the parameter estima-tion work done and study how well the developed degradationmodel, D1 behaves based on the estimated static parameters.As an initial step we implement a nonlinear least-squares re-gression algorithm to estimate the model parameters.

0 500 1000 1500 2000 2500 3000 3500470

480

490

500

510

520

530

Aging Time ( Hours )

Vol

ume

(mm

3 )

measured ( Cap. #1 −15)estimated (fit)

Figure 5. Estimation results for Volume decrease

Decrease in capacitance parameter is used as a precursor offailure. Based on the experiments, capacitance parameter val-ues are computed by characterizing the capacitors as shownin plots of Fig. 2. From the degradation model, D1 given acertain type of capacitor all the values in Eq. (8) can be com-puted except the dispersion volume V . Therefore dispersionvolume V is computed based on the available data and usedto build a physics based model, D1 of the degradation phe-nomenon. Initial electrolyte volume Vo at pristine conditionsis approximately computed from the physics and geometry ofthe capacitor. From the experimental data the estimated vol-ume computed decreases almost linearly through the initialphase of degradation. Hence in this work we propose a lin-ear dynamic model, which relates aging time to loss of elec-trolyte volume. The loss in electrolyte is linked to decreasein capacitance through Eq. (8) and has the following form,

Vk = θ1 + θ2 tk + θ3 tk2 (11)

where θ1, θ2 and θ3 are model constants for decrease in vol-ume V , which is estimated from the experimental data of ac-celerated thermal aging experiments. In order to estimate themodel parameters, 14 capacitors out of the 15 were used forthe experiment, (labeled capacitors #1 through #15), and theremaining capacitor is used to validate the model against ex-perimental data. A nonlinear least-squares regression algo-rithm is used to estimate the model parameters.

0 500 1000 1500 2000 2500 3000 3500−3

−2

−1

0

1

2

Time ( Hours )

Res

idu

als

Residuals

Figure 6. Residuals

The experimental data is presented together with results fromthe linear fit function of Eq. (11) and Eq. (8), as shown inFig. 5. It can be observed from the residuals of Fig. 6 that theestimation error increases with time. This is to be expectedsince the data takes a concave path after approximately 2500hours of operation, a dip is observed in the linear degrada-tion and hence we observe higher residuals values. This in-dicates that the phenomenon of volume decrease is not linearand we are working towards updating the model in Eq. (11).The updated model will include additional degradation phe-nomenons in addition to the current model explained which

6


161


will take into consideration the dipping in the volume param-eter during the later stages of aging as observed.

0 500 1000 1500 2000 2500 3000 35001700

1750

1800

1850

Time (Hours)

Cap

acit

ance

(u

F)

0 500 1000 1500 2000 2500 3000 3500450

500

550

Time (Hours)

Vo

lum

e (m

m3 )

computed Volume # 4estimated Volume

measured Capacitance # 4estimated Capacitance

Figure 7. Volume and Capacitance Estimation (Cap # 4)

The updated degradation model is used as part to estimatethe capacitance based on the estimated decrease in volume.In Fig.7, based on the data from capacitors other than capaci-tor #4, volume parameters were estimated. This was validatedagainst the change in volume of capacitor #4, and the model,D1 was validated for decrease in capacitance.

Case θ1 θ2 θ3 MSE1 523.6123 -0.01613 3.7100*10−7 685.65932 523.6122 -0.01613 3.7099*10−7 685.08153 523.6159 -0.01614 3.9403*10−7 684.35794 523.6109 -0.01609 3.8072*10−7 687.37555 523.6128 -0.01614 3.8428*10−7 688.38246 523.6100 -0.01613 3.7867*10−7 690.61467 523.6081 -0.01614 3.7269*10−7 688.10038 523.6089 -0.01613 3.7988*10−7 691.71739 523.6111 -0.01616 3.7447*10−7 686.079910 523.6122 -0.01613 3.8470*10−7 687.892811 523.6076 -0.01611 3.7350*10−7 690.565012 523.6065 -0.01614 3.7313*10−7 683.069713 523.6147 -0.01609 3.8906*10−7 686.473914 523.6120 -0.01612 3.8276*10−7 689.631815 523.6113 -0.01616 3.8317*10−7 689.8948

X 523.6112 -0.0161 3.8077*10−7 687.6598X 523.6113 -0.0161 3.8072*10−7 687.8928S.D 0.0026 1.8748*10−5 6.9373*10−9 2.5339C.I 523.6098 -0.01614 0.3769*10−6 686.2565

523.6127 -0.01611 0.3846 *10−6 689.0630

Table 1. Parameter Estimation Results

Table 1 shows the estimated values of the parameters for eachcapacitor along with the mean square error observed for theestimated values.

6. CONCLUSION AND DISCUSSION

This paper presents a first principles based degradation elec-trolytic capacitor model and a parameter estimation algorithmto validate the derived model, based on the experimental data.The major contributions of the work presented in this paperare:

1. Identification of the lumped-parameter model, M1 andM2 (Fig. 3 and Fig. 4) based on the equivalent electricalcircuit of a real capacitor as a viable reduced-order modelfor prognostics-algorithm development;

2. Identification of capacitance (C) as a failure precursor inthe lumped parameter model,M1 as shown in Fig. 3;

3. Estimating the electrolyte volume from structural modelof the capacitor to be implemented in the first principlesdegradation model, D1;

4. Development of the first principles degradation modelbased on accelerated life test aging data which includesdecrease in capacitance as a function of time and evapo-ration rate linked to temperature conditions;

5. Implementation of parameter estimation algorithm tocross validate the derived first principles degradationmodel, D1.

The degradation model, D1 based on the first principles givesan indication of how a specific device degrades based on itsgeometrical structure, operating conditions, etc. The derivedmodel can be updated and developed at a more finer granularlevel to be implemented for detailed prognostic implemen-tation. The results presented here are based on acceleratedaging experimental data and on the accelerated life timescale.In our earlier work physics inspired degradation models basedon the observed data were discussed (J. Celaya et al., 2011b,2011a). The work discussed in this paper is a next step to gen-eralize the degradation model and has been tested for the cur-rent data of capacitors under constant temperature of 105C.As discussed in section (5), as a first step an linear model hasbeen implemented for decrease in volume, V and needs to beupdated to include the operating condition variables.

The performance of the proposed first principles degradationmodel, D1 is satisfactory for this study based on the qualityof the model fit to the experimental data and cross validationperformance based on the parameter estimations done. In thiswork, our major emphasis was on deriving the first principlesmodel for degradation and validating the model with a ba-sic non-linear regression model. Our future work will focuson exploring a detailed implementation of the physics basedmodel to Bayesian approach which can then be used for mak-ing more accurate degradation and failure predictions. Theother focus point will be on using the physics based model tovalidate the capacitor data under different thermal conditionsand capacitor geometry. This will greatly enhance the qual-

7


162


ity and effectiveness of the degradation models in prognos-tics, where the operating and environmental conditions alongwith the structural conditions are also accounted for towardsdegradation dynamics.

NOMENCLATURE

εR relative dielectric constantεO permitivity of free spaceto oxide thicknessV dispersion volume at time tVO initial electrolyte volumejeo evaporation rate (mg min−1 area−1)t time in minutesρE electrolyte resistivityPE correlation factor related to electrolyte

spacer porosity and average liquid pathway.rc radius of capacitor capsulehc height of capacitor capsuleVpaper volume of paper.L length of the anode oxide surfaceH height of the anode oxide surfaceAsurface effective oxide surface area (L x H)we volume of ethyl glycol moleculeVc total capacitor capsule volumedA thickness of anode strip,dC thickness of cathode stripC capacitanceM1 electrical lumped parameter modelM2 updated lumped parameter modelD1 capacitance degradation modelD2 ESR degradation model

REFERENCES

60068-1, I. (1988). Environmental testing, Part 1: Generaland guidance. IEC Standards.

60384-4-1, I. (2007). Fixed capacitors for use in electronicequipment. IEC Standards.

Balaban, E., Saxena, A., Narasimhan, S., Roychoudhury,I., Goebel, K., & Koopmans, M. (2010). AirborneElectro-Mechanical Actuator Test Stand for Develop-ment of Prognostic Health Management Systems. Pro-ceedings of Annual Conference of the PHM Society2010, October 10-16, Portland, OR.

Bengt, A. (1995). Electrolytic Capacitors Theory and Appli-cations. RIFA Electrolytic Capacitors.

Bharadwaj, R., Kulkarni, C., Biswas, G., & Kim, K. (2010,April). Model-Based Avionics Systems Fault Simu-lation and Detection. American Institute of Aeronau-tics and Astronautics, AIAA Infotech@Aerospace 2010,AIAA-2010-3328.

Biologic. (2010). Application note 14-Zfit and equivalentelectrical circuits [Computer software manual].

Buiatti, G., Martin-Ramos, J., Garcia, C., Amaral, A., & Car-

doso, A. (2010). An Online and Noninvasive Tech-nique for the Condition Monitoring of Capacitors inBoost Converters. IEEE Transactions on Instrumen-tation and Measurement, 59, 2134 - 2143.

Celaya, J., Kulkarni, C., Biswas, G., & Goebel, K. (2011a). AModel-based Prognostics Methodology for ElectrolyticCapacitors Based on Electrical Overstress AcceleratedAging. Proceedings of Annual Conference of the PHMSociety, September 25-29, Montreal, Canada.

Celaya, J., Kulkarni, C., Biswas, G., & Goebel, K. (2011b).Towards Prognostic of Electrolytic Capacitors. Ameri-can Institute of Aeronautics and Astronautics, AIAA In-fotech@Aerospace 2011, March 2011, St. Louis, Mis-souri.

Celaya, J. R., Wysocki, P., Vashchenko, V., Saha, S., &Goebel, K. (2010). Accelerated aging system for prog-nostics of power semiconductor devices. In IEEE AU-TOTESTCON, 2010 (p. 1-6). Orlando, FL.

Eliasson, L. (2007, October - November). Aluminium Elec-trolytic Capacitor’s performance in Very High RippleCurrent and Temperature Applications. CARTS Europe2007 Symposium, Spain.

Fife, J. (2006, Aug). Wet Electrolytic Capacitors (Patent No:7,099 No. 1). Myrtle Beach, SC: AVX Corporation.

Gasperi, M. L. (1996, October). Life Prediction Model forAluminum Electrolytic Capacitors. 31st Annual Meet-ing of the IEEE-IAS, 4(1), 1347-1351.

Goebel, K., Saha, B., & Saxena, A. (2008). A Comparision ofThree Data-Driven Tes for Prognostics. 62nd Meetingof the Society For Machinery Failure Prevention Tech-nology (MFPT) Virginia Beach, VA, 119 - 131.

Gomez-Aleixandre, C., Albella, J. M., & Martnez-duart,J. M. (1986). Pressure build-up in aluminum elec-trolytic capacitors under stressed voltage conditions.Journal of Applied Electrochemistry, Volume 16, Num-ber 1, 109 - 115.

Goodman, D., Hofmeister, J., & Judkins, J. (2007). Elec-tronic prognostics for switched mode power supplies.Microelectronics Reliability, 47(12), 1902-1906.

Gu, J., Azarian, M. H., & Pecht, M. G. (2008). Fail-ure Prognostics of Multilayer Ceramic Capacitors inTemperature-Humidity-Bias Conditions. InternationalConference on Prognostics and Health Management.

Gu, J., & Pecht, M. (2008). Prognostics and Health Manage-ment Using Physics-of-Failure. 54th Annual Reliabilityand Maintainability Symposium (RAMS).

Hayatee, F. G. (1975). Heat Dissipation and Ripple Cur-rent rating in Electrolytic Capacitors. Electrocompo-nent Science and Technology, 2, 109-114.

Ikonopisov, S. (1977). Theory of electrical breakdown duringformation of barrier anodic films. Electrochimica Acta,,Volume 22, Issue 10, 1077 - 1082.

Imam, A., Habetler, T., Harley, R., & Divan, D. (2005,June). Condition Monitoring of Electrolytic Capaci-

8


163


tor in Power Electronic Circuits using Adaptive FilterModeling. IEEE 36th Power Electronics SpecialistsConference, 2005. PESC ’05., 601-607.

Kulkarni, C., Biswas, G., Celaya, J., & Goebel, K. (2011a).A Case Study for Capacitor Prognostics under Accel-erated Degradation. IEEE 2011 Workshop on Accel-erated Stress Testing & Reliability (ASTR), September28-30, San Francisco, CA.

Kulkarni, C., Biswas, G., Celaya, J., & Goebel, K. (2011b).Prognostic Techniques for Capacitor Degradation andHealth Monitoring. The Maintenance& ReliabilityConference, MARCON 2011, Knoxville, TN.

Kulkarni, C., Biswas, G., & Koutsoukos, X. (2009). A prog-nosis case study for electrolytic capacitor degradationin DC-DC converters. Proceedings of Annual Confer-ence of the PHM Society, September 27 October 1, SanDiego, CA.

Kulkarni, C., Celaya, J., Biswas, G., & Goebel, K. (2011).Prognostic Modeling and Experimental Techniques forElectrolytic Capacitor Health Monitoring. The 8th In-ternational Workshop on Structural Health Monitoring2011 (IWSHM) , September 13-15, Stanford University,Stanford, CA.

Kulkarni, C., Celaya, J., Biswas, G., & Goebel, K. (2012).Prognostic and Experimental Techniques for Elec-trolytic Capacitor Health Monitoring. The Annual Re-liability and Maintainability Symposium (RAMS), Jan-uary 23-36, Reno, Nevada..

Lahyani, A., Venet, P., Grellet, G., & Viverge, P. (1998, Nov).Failure prediction of electrolytic capacitors during op-eration of a switchmode power supply. IEEE Transac-tions on Power Electronics, 13, 1199-1207.

Nie, L., Azarian, M., Keimasi, M., & Pecht, M. (2007). Prog-nostics of ceramic capacitor temperature humidity biasreliability using mahalanobis distance. Circuit World,33(3), 21 - 28.

Orsagh, R., & et’al. (2006, March). Prognostic HealthManagement for Avionics System Power Supplies.Aerospace Conference, 2006 IEEE, 1-7.

Roederstein, V. (2007). Aluminum Capacitors - General In-formation. Document - 25001 January 2007.

Rusdi, M., Moroi, Y., Nakahara, H., & Shibata, O. (2005).Evaporation from Water Ethylene Glycol Liquid Mix-ture. Langmuir - American Chemical Society, 21 (16),7308 - 7310.

Saha, B., Celaya, J. R., Wysocki, P. F., & Goebel, K. F.(2009). Towards prognostics for electronics compo-nents. In IEEE Aerospace conference 2009 (p. 1-7).Big Sky, MT.

Saxena, A., Celaya, J., Balaban, E., Goebel, K., Saha, B.,Saha, S., et al. (2008). Metrics for evaluating perfor-mance of prognostic techniques. In International Con-

ference on Prognostics and Health Management 2008.Vohnout, S., Kozak, M., Goodman, D., Harris, K., & Jud-

kins, J. (2008). Electronic Prognostics System Imple-mentation on Power Actuator Components. AerospaceConference, 2008 IEEE, 1 - 11.

Wereszczak, A., Breder, K., & Ferber, M. K. (1998). FailureProbability Prediction of Dielectric Ceramics in Mul-tilayer Capacitors. Annual Meeting of the AmericanCeramic Society, Cincinnati, OH (United States).

Wit, H. D., & Crevecoeur, C. (1974). The dielectric break-down of anodic aluminum oxide. Physics Letters A,Volume 50, Issue 5, 365 - 366.

Chetan S. Kulkarni is a Research Assistant at ISIS, Van-derbilt University. He received the M.S. degree in EECSfrom Vanderbilt University, Nashville, TN, in 2009, wherehe is currently a Ph.D candidate and received a B. E. degreein Electronics and Electrical Engineering from University ofPune, India in 2002.

Jose R. Celaya is a research scientist with SGT Inc. atthe Prognostics Center of Excellence, NASA Ames ResearchCenter. He received a Ph.D. degree in Decision Sciences andEngineering Systems in 2008, a M. E. degree in OperationsResearch and Statistics in 2008, a M. S. degree in ElectricalEngineering in 2003, all from Rensselaer Polytechnic Insti-tute, Troy New York; and a B. S. in Cybernetics Engineeringin 2001 from CETYS University, Mexico.

Kai Goebel received the degree of Diplom-Ingenieur fromthe Technische Universitt Mnchen, Germany in 1990. He re-ceived the M.S. and Ph.D. from the University of Californiaat Berkeley in 1993 and 1996, respectively. Dr. Goebel isa senior scientist at NASA Ames Research Center where heleads the Diagnostics and Prognostics groups in the Intelli-gent Systems division. In addition, he directs the PrognosticsCenter of Excellence and he is the technical lead for Prog-nostics and Decision Making of NASAs System-wide Safetyand Assurance Technologies Program. He worked at Gen-eral Electrics Corporate Research Center in Niskayuna, NYfrom 1997 to 2006 as a senior research scientist. He has car-ried out applied research in the areas of artificial intelligence,soft computing, and information fusion. His research interestlies in advancing these techniques for real time monitoring,diagnostics, and prognostics. He holds 15 patents and haspublished more than 200 papers in the area of systems healthmanagement.

Gautam Biswas received the Ph.D. degree in computer sci-ence from Michigan State University, East Lansing. He is aProfessor of Computer Science and Computer Engineering inthe Department of Electrical Engineering and Computer Sci-ence, Vanderbilt University, Nashville, TN.

9


164

Prediction of Fatigue Crack Growth in Airframe Structures

Jindřich Finda1, Andrew Vechart

2 and Radek Hédl

3

1,3Honeywell International s.r.o., Brno, Tuřanka 100, 627 00, Czech Republic

[email protected]

[email protected]

2Honeywell International Inc., 1985 Douglas Drive North (M/S MN10-112B), Golden Valley, MN 55422, USA

[email protected]

ABSTRACT

The paper describes the general proposal, function and

performance of a prognostics system for fatigue crack

growth in airframe structures. Prognostic capabilities are

important to a superior Structural Health Monitoring System

(SHM). An aim of the prognosis is estimation / prediction of

a system / subsystem / structure remaining life, i.e. the time

at which damage (crack, corrosion, wear, delamination,

disbonding, etc.) will result in a failure of the considered

element.

1. INTRODUCTION

Fatigue damage and its consequences are the most serious

structural design and maintenance issues that have to be

addressed. Several philosophies of how to decrease

consequences of a fatigue hazard have been developed and

applied. Serious aircraft accidents due to fatigue have

contributed to this development and have started research

efforts in this area. Two main philosophies for aircraft

structure design are used nowadays:

Safe Life - an extremely low level of risk is accepted

through a combination of testing and analysis that the part

will ever form a detectable crack due to fatigue during the

service life of the structure.

Damage Tolerance - structure has the ability to sustain

defects safely until the defect is detected and repaired.

Structural Health Monitoring (SHM) - represents the next

advanced step in structural damage monitoring and

maintenance planning. An occurrence of structural damage

is monitored by a sophisticated automated system. Its usage

does not demand additional time from inspections and

qualified personal. A significant part of a SHM system and

its application in aerospace is prognostics. It shifts a

structural maintenance program to an advanced level and

brings significant benefits for an aircraft operator like

efficient maintenance planning, effective aircraft usage,

service cost decrease, safety increase, etc. Its application

opens a new approach to aircraft design and brings a new

advanced philosophy of structural lifetime estimation.

2. ENTIS PROJECT DESCRIPTION

This paper describes our approach to fatigue damage

prognostics development. It is based on results from the

ENTIS project supported by the Ministry of Industry and

Trade of the Czech Republic. The main goal of this project

is SHM system development, particularly its real application

form, capabilities, conditions and issues definition.

Experimental testing is a basic part of this project. Fatigue

tests of aircraft structure parts have been done and structural

damage has been monitored by an ultrasonic method. All

structural specimens are parts of an L-410 UVP-E airplane

(Figure 1), which is an all-metal high-wing monoplane

powered by two turboprop engines. The airplane is certified

in the commuter category in accordance with (FAR) PART

23 requirements. All structural specimens are considered

critical parts of the aircraft structure in the sense of fatigue

damage. The fatigue test arrangement is shown in Figure 2.

Figure 1. Aircraft Industries a. s. L-410 UVP-E airplane

_____________________

Jindřich Finda et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License,




165


2

Figure 2. Wing Spar Fatigue Test

For the purpose of generation and registration of ultrasonic

waves shear plate PZT actuators were used (Figure 3).

Characteristics of the particular PZT actuators from Noliac

used are shown in (Table 1). The actuators are characterized

by their small dimensions, low weight, and low cost. This

allows installation of large and non-expensive sensor arrays

on an airframe with very small impact on the structure

performance and aerodynamic properties. Moreover, the

small dimensions of the PZT elements are suitable for

integration of the sensor array on a flexible strip, which

significantly accelerates the process of sensor array

installation on the monitored structure.

Figure 3. Wing Spar Fatigue Test

Type Length

[mm]

Width

[mm]

Height

[mm]

Maximum

voltage

[V]

Free

stroke

[µm]

Capacitance

[pF]

CSAP02 5 5 0.5 +/-320 1.5 830

Table 1. PZT Actuator Parameters

Figure 4. Block scheme of the crack growth monitoring

system

The block scheme of the advanced signal processing for the

automated monitoring of fatigue damage of the particular

aircraft structural part is in Figure 4. A sparse sensor array

enclosing the monitored area (Hot Spot) is used for

collection of the data used for evaluation of the actual state

of the structure. Therefore, the automated defect detection /

sizing is based on evaluation of changes in direct signal

paths, i.e. signals between individual pairs of PZT actuators

where one PZT actuator works as source of the ultrasonic

wave and the second one as sensor. First, Signal Difference

Coefficients (SDCs) for individual paths are calculated

evaluating differences between baseline and actual signals

measured on the monitored structure. These SDCs represent

a Damage Index (DI), which gives information on the extent

of the structure damage. The defect occurrence is indicated

and defect size is estimated using an Artificial Neural

Network, which transforms a feature vector capturing

significant features (DIs) of an identified defect to

demanded parameters (defect occurrence, defect size),

which are used as inputs of prognostics algorithm.

PZT Sensors


166


3

3. FATIGUE CRACK GROWTH PROGNOSTICS

The fatigue crack growth prognosis is used for RUL

prediction. The RUL is defined by the crack length reaching

its critical size limit. The concept of the crack growth

prognostic algorithm is shown in (Figure 4). Input of the

prognostic algorithm consists of the crack length observed

during a particular inspection, i.e. measured by the SHM

system, and the typical loading sequence. Several

algorithms for the crack growth calculation can be found in

literature (Beden & Abdullah & Ariffin, 2009). Those

algorithms are based on various approaches: fracture

mechanics & empirical models (Beden & Abdullah &

Ariffin, 2009), or data based models (Forman, R. G. &

Shivakumar, V & Cardinal J.W. & Wiliams, L.C. &

McKeigham, P.C. 2005). Suitability of these approaches for

a particular application depends on the actual type of

loading, type of structure and other boundary conditions and

available inputs for tuning of the crack growth model.

Our developed prognostic system uses the NASGRO

equation as the crack growth model (NASGRO Reference

Manual, version 4.2), Augustin (2009). This approach was

selected for the following reasons: (1) This equation is

widely used in aerospace, (2) All inputs required for the

crack growth model related to duralumin aircraft structure

are available in literature, (3) Influence of variable

amplitude loading is accounted for in the NASGRO model,

(4) This algorithm solves the crack growth in all three

phases of the crack propagation (crack initiation, stable

crack growth, unstable crack growth).

The SHM system is focused on the so called Principle

Structure Element (PSE). PSE’s are those elements of a

primary structure which contribute significantly to carrying

flight, ground, and pressurization loads, and whose failure

could result in catastrophic failure of the airplane. Sensors

are installed on hot-spots in order to provide information on

the actual status of the structure integrity, (i.e. actual length

of the fatigue crack). Each hot-spot is treated separately,

(i.e. prognosis of the crack growth for particular hot-spot is

done without accounting for the effect of presence of other

cracks out of the hot-spot). However, each hot-spot may

contain multiple cracks.

3.1. Input Crack Length

The crack length observed during a particular inspection is

used as an initial crack length for the prognostic algorithm.

Evaluation of the crack length is done using a feature vector,

which consists of DIs. The vector is applied on an input of

an Artificial Neural Network as was described above. The

prognostic algorithm propagates the initial crack length into

the future using a typical loading sequence. Thus, the

prognosis is done for each inspection, i.e. for each crack

length observed.

3.2. Typical Loading Sequence and Flight Loading

Spectrum

The type of the wing flange specimen loading is the same as

the real loading on a real aircraft wing flange. It is a

combination of a bending moment and an axis force.

The loading sequence represents a series of loading cycles

(Figure 5) affecting the structure. Duration of a loading

cycle is constant for the whole sequence, i.e. 1/3s. Each

cycle is described by its maximal and minimal stress levels

[σmin, σmax]. The maximal and minimal stress levels are

expressed as multiples of a nominal stress σ0. Thus, we have

a pair of numbers nmin, nmax so called load factors for a

single loading cycle.

A typical flight spectrum for a particular aircraft (Figure 6)

is used in order to define a typical flight loading sequence.

The typical flight spectrum was defined according to FAA

AC 23-13A. The total loading spectrum during the flight

composes two loading spectra: wind gust and maneuvers.

The loading sequence can be derived from the typical flight

spectrum in various ways. Most often the block loading

sequence or random loading sequence is used:

Block loading sequence – loading cycles of the same

amplitude are organized in blocks, which consist of a

number of loading cycles.

Random loading sequence – loading cycles, which have

various amplitudes, are randomly organized in the loading

sequence (Figure 7).

Figure 5. Typical Loading Cycle


167


4

Figure 6. Typical Flight Spectrum for L-410 UVP-E

Figure 7. Random Loading Sequence for One Flight

3.3. Stress Intensity Factor

The calculation of the stress intensity factor (SIF) at the

crack tip for a nominal load is based on a finite element

analysis (FEA), which requires knowledge of the structure

and crack geometry. The FEA provides stress energy release

rates for the tip from where stress intensity factor Kσ0(N)

can be calculated. The FEA analysis is time consuming and

too complex to include in an online system. Online FEA

calculation of the stress intensity factor Kσ0(N) is replaced

by a lookup table for our purpose.

3.4. Crack Growth Equation

Calculation of the crack increment for a particular load

cycle is done using the NASGRO equation of fracture

mechanics Eq. (1).

(1)

where C, f, ΔKth, Kmax, Kc, p and q are model parameters

given by the structure material and geometry.

4. PROGNOSTIC ALGORITHM PERFORMANCE

Two sets of experimental data from laboratory fatigue tests

of wing flanges (Figure 2) were used as inputs for the

prognosis algorithm performance assessment. In both tests,

a two-tip crack 2 x 1.27 mm long was initiated at the rivet

hole. One tip pointed to edge of the flange – external crack,

and the other one to the flange axis of symmetry - internal

crack. The intention of the experiment was to 1) evaluate

performance of the crack measurement system and 2) obtain

fatigue crack growth data. Ideally the only two major cracks

near the measurement site would have been those initiated

intentionally. This was the case for the first set of

experimental data. However, during the second experiment,

several other cracks formed within the test article,

contaminating the experimental data. Nonetheless, the

performance of the fatigue crack prognosis model is

compared with both sets of experimental data.

Figure 8. Fatigue Crack Prognosis with First Set of

Experimental Data


168


5

Figure 9. Predicted Times to Internal Crack Length of 38

mm (first experimental data)

Figure 8 and Figure 9 show the outcome of the fatigue crack

growth prognostic algorithm applied to the first set of

experimental data. In Figure 8, the dashed data lines

represent the crack lengths measured by fractography at

specific times during the test for the internal and external

cracks. The solid lines indicate the predicted crack growths

where the crack growth prognosis was initiated at each

fractography measurement of the crack length. Each

prognosis proceeds no farther than 20,000 flights into the

future (where each flight consists of approximately 23

loading cycles).

Four uncertainties are considered in our work: Loading

uncertainty, SIF uncertainty, Prognostics model uncertainty

and Measured crack length uncertainty.

Figure 9 shows the predicted times the crack will reach the

critical length. The solid curve represents the actual time

predicted by the algorithm. The dashed curve includes an

offset of this prediction accounting for minimum

uncertainties due to loading and SIF uncertainties:

Loading uncertainty – One of sources of uncertainty in

crack growth prognosis is the assumed loading sequence

(random loading sequence drawn from the typical loading

spectrum for this type of aircraft). For several initial crack

lengths, several different crack prognoses were run, each

using a random ordering of the selected loading sequence.

The upper end of the 95% confidence interval on the

standard deviation of the times from these prognoses was

then calculated.

SIF uncertainty – A key parameter in fatigue crack growth

models is the crack Stress Intensity Factor (SIF). For

purposes of this project, SIFs are evaluated using a

Boundary Element Analysis software package. Crack

geometries are entered into the software, and post-

processing yields an estimate of the stress intensity factor.

The time to setup and execute this stress analysis is too long

to perform for each step of the fatigue crack prognosis.

Therefore, a lookup table has been generated for selected

crack lengths (minimum, medium, and maximum). The

lookup table has been generated under the assumption that

a single stress may be used to determine a baseline SIF, and

this baseline can then be scaled by a load factor to calculate

an actual SIF as a function of crack length and loading

condition. The SIF for particular crack length is then

calculated using the values tabulated in the look-up table

and an interpolation technique, which is the source of the

uncertainty in the SIF estimation.

Prognostics model and measured crack length uncertainties

are not described in this paper.

The solid horizontal line indicates the time to reach this

crack length as indicated by the experimental data. In the

figure it can be seen that as the time until the internal crack

reaches the critical length gets smaller, the identified

uncertainties do not account for the error between the

prediction and the experimental results.

Figure 10 shows the prognostic algorithm applied to the

second set of experimental data. As mentioned before, the

second set of experimental data was contaminated by

additional unintended cracks propagating during the test.

Predicted times to reach the critical crack length are

presented in Figure 11.

Figure 10. Fatigue Crack Prognosis with Second Set of

Experimental Data


169


6

Figure 11. Predicted Times to Internal Crack Length of 38

mm (Second Experimental Data)

5. CONCLUSIONS

The prognostic system allows the possibility of fatigue

damage growth prediction and mitigates an issue of a

corrective maintenance planning. Our prognostic system

design is based on the traditional method (NASGRO

equation) used for crack propagation modeling for damage

tolerance relating analyses. Prognosis of simultaneous

propagation of multiple cracks is possible. In this case,

multi-dimensional lookup tables for SIF estimation have to

be used in order to account for interaction between

individual cracks. A connection of the SHM and

Prognostics system brings a novel advanced capability of an

interactive fleet management and prognostics regarding

fatigue damage growth. It shows a new dimension of

maintenance planning and organization of all maintenance

tasks are done in real time that is estimated by aircraft

operators regarding their requirements and minimal costs.

The accuracy of the crack growth prediction is influenced

by several parameter uncertainties. This was demonstrated

by the prognostics applied on two fatigue test results. Those

parameters uncertainties (loading, SIFs, crack size

estimation, etc.) have to be considered. The prognostics

output could be influenced by addition boundary conditions

(additional cracks – flange fatigue test 2). In this case

prognostics results do not follow the real crack propagation

curve exactly. A solution is monitoring of changes of

boundary conditions and adjustment of prognostics input

parameters with regard to these changes.

ACKNOWLEDGEMENT

The presented work has been supported by the Ministry of

Industry and Trade of Czech Republic by grant project no.

FR-TI1//274 under framework program TIP.

NOMENCLATURE

Da Crack size increment

DI Damage Index

f Opening Function

FEA Finite Element Analyses

Kc Fracture Toughness

Kσ0 Stress intensity factor

σmin Minimal stress level

σmax Maximal stress level

nmin Cycle minimal load factor

nmin Cycle maximal load factor

PSE Principle Structure Element

PZT Lead Zirconate Titanate

RUL Remaining Usage Life

SDC Signal Difference Coefficient

SHM Structure Health Monitoring

SIF Stress Intensity Factor

REFERENCES

Augustin, P. (2009). Simulation of Fatigue Crack Growth in

the High Speed Machined Panel under the Constatnt

Amplitude and Spectrum Loading. 25th

ICAF

Symposium. Rotterdam.

Beden, S. M. & Abdullah, S. & Ariffin, A. K. (2009).

Review of fatigue Crack Propagation Models for

Metallic Components. European Journal of Scientific

Research. ISSN 1450-216X Vol.28 No.3.

Federal Aviation Regulations (FAR). PART 23

Airworthiness standards: Normal, utility, acrobatic, and

commuter category airplanes.

Federal Aviation Advisory Circular (AC). AC 23-13A –

Fatigue, Fail-Safe, and Damage Tolerance Evaluation

of Metallic Structure for Normal, Utility, Acrobatic,

and Commuter Category Airplanes.

Forman, R. G. & Shivakumar, V & Cardinal J.W. &

Wiliams, L.C. & McKeigham, P.C. (2005), Fatigue

Crack Growth Database for Damage Tolerance

Analysis, DOT/FAA/AR-05/15

NASGRO Reference Manual, version 4.2, NASA Johnson

Space Center, Southwest Research Institute, San

Antonio.


170


7

BIOGRAPHIES

Jindrich Finda (March 28th, 1980)

earned his Master of Science in Aircraft

Design from Brno University of

Technology, Faculty of Mechanical

Engineering, Institute of Aerospace

Engineering in 2003 and his PhD. in

Methods for Determination of

Maintenance Cycles and Procedures for Airplanes/ Airplane

Assemblies from Institute of Aerospace Engineering in

2009. Jindrich Finda works as a Scientist II R&D. His work

is aimed on the SHM system development, (developing

algorithms for advanced ultrasonic signal / image

processing, and algorithms for automated defect detection,

localization, size evaluation and prognosis of the defect

growth, SHM integration into aircraft maintenance plan).

Andrew Vechart earned his Master of

Science in Computation for Design and

Optimization from Massachusetts Institute

of Technology in 2011 and his Bachelor of

Science in Mechanical Engineering and

Physics from the University of Wisconsin

– Milwaukee in 2009. He has been a R&D

scientist with Honeywell focusing on

vehicle health management projects since 2011. At

Honeywell, he performed prognostic algorithm development

for a structural health monitoring (SHM) application. He is

also program manager and project engineer for a Honeywell

program to update existing and design new firmware and

software for an FPGA application. His interests include

structural health management, embedded systems, and

scientific computing.

Radek Hedl (November 19th

, 1973) earned

his Master of Science in Cybernetics,

Automation and Measurement from

Department of Biomedical Engineering,

Faculty of Electrical Engineering and

Computer Science, Brno University of

Technology and his PhD. in Cybernetics

and Computer Science from Department of

Biomedical Engineering, Faculty of Electrical Engineering

and Computer Science, Brno University of Technology.

Radek Hedl works as a Sr. R&D Scientist in the Mechanical

& Simulation Technologies group. In particular, he is a

leader of CBM/SHM sub-group. His responsibilities include

development of the resource and technology capabilities of

the CBM/SHM group in Brno, involvement in definition of

long term strategy technology roadmaps in the CBM/SHM

area, and leading R&D projects. His R&D activities include

developing algorithms for advanced ultrasonic signal /

image processing, and algorithms for automated defect

detection, localization, size evaluation and prognosis of the

defect growth.


171

Simulation Framework and Certification Guidance for Condition Monitoring and Prognostic Health Management

Dipl.-Ing. Matthias Buderath1

Partha Pratim Adhikari2

1, Cassidian, Manching, 85104, Germany [email protected]

2, Cassidian, CAIE, Bangalore, 560 016, India [email protected]

ABSTRACT

The most prominent challenges to the successful qualification of Integrated System Health Monitoring (ISHM) systems are appropriate technology development processes and Verification & Validation (V&V) methods towards certification. This paper outlines a survey of recent ISHM programs in diverse industrial sectors across the globe, offers guideline towards ISHM development at each Technology Readiness Level (TRL), and sets forth a V&V process and certification roadmap. This paper provides insight into Cassidian’s ISHM Simulation framework and emphasizes the relevance of this framework to an effective V&V solution of ISHM.

1. INTRODUCTION

With growing financial uncertainty, air vehicle operators (both commercial and military) are under tremendous pressure to reduce operational and support costs. It is accepted across the aerospace industry that ISHM is a potentially valuable strategy for the manufacture and management of vehicle platforms. At the same time, ISHM has not yet fully matured as a technology in several key functional areas. Research and development to address this shortfall is occurring across both the automobile and aerospace industries. Although technologies related to Built–In-Test (BIT) and diagnostics have advanced greatly and research into enhanced diagnostics are progressing very fast, prognostics technology for all types of aircraft sub-systems are in a very nascent stage. 1

M. Buderath et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Validation & Verification (V&V) method leading to the qualification and certification of ISHM is a key area of development. Although there has been considerable effort in this direction, ISHM system at the aircraft level is yet to be certified. Certification agencies (EASA, FAA, SAE, etc.) have yet to establish comprehensive certification regulation for Integrated System Health Monitoring system. Kevin R. Wheeler et al. (2010) contribute to an extensive survey of recent ISHM programs and mention that vast differences in user objectives with regard to engineering development is the major barrier for successful V&V. The paper identifies in detail the objectives and associated metrics across operational, regulatory and engineering domains for diagnosis and prognosis algorithms and systems. James E. Dzakowic et al. (2004) introduce a methodology for verifying and validating the capabilities of detection, diagnostic and prognostic algorithms through an on-line metrics based evaluation. Martin S. Feather (2005) mentions in his publication that state-of-the-practice V&V and certification techniques will not suffice for emerging forms of ISHM systems. However, a number of maturing software engineering assurance technologies show particular promise for addressing these ISHM V&V challenges. Dimitry Gorinevsky et al. (2010) describe the importance of a NASA-led effort in open system IVHM architecture. Detailed functional decompositions of IVHM systems with respect to criticality, on/off board operation and development cost are presented and certification standards


172


2

are mapped accordingly. This paper also addresses the current NASA IVHM test bed along with development and deployment steps corresponding to increasing TRL. The FAA’s advisory circular (AC), AC 29-2C MG-15, provides guidance in achieving airworthiness approval for rotorcraft Health and Usage Monitoring System (HUMS) installations. It also outlines the process of credit validation, and Instructions for Continued Airworthiness (ICA) for the full range of HUMS applications. Brian D Larder et al. (2007) converted the text of AC 29-2C MG-15 into a flow chart. His intention was to define the generic end-to-end certification process for HUMS CBM credit. Further, he sought to identify the relationships and interactions between different elements of the certification process that are contained in the three separate sections of the AC ( installation, credit validation, and Instructions for Continued Airworthiness). This paper also mentions that HUMS have achieved very few credits, and that the material in the AC is largely untested. However HUMS in-service experience shows that the potential for future credits does exist. ADS-79B HDBK (2011) describes the US Army‘s Condition Based Maintenance (CBM) system and defines the overall guidance necessary to achieve CBM goals for Army aircraft systems and Unmanned Aircraft Systems (UAS). Praneet Menon et al. (2011) published a paper, which summarizes the work of a Vertical Lift Consortium Industry Team to provide the detailed guidance for the Verification and Validation (V&V) of CBM Maintenance Credits. SAE ARP 5783 summarises the key metrics for evaluating diagnostic algorithms along with expression of these matrices. As per the SAE news letter (2010), an Integrated Vehicle Health Management (IVHM) Steering Group has been formed to explore the need for standardization in order to drive IVHM technology towards the following objective.

• the development of a single definition and taxonomy of IVHM to be used by the aerospace and IVHM communities

• the identification of how and where IVHM could be implemented

• the development of a roadmap for IVHM standards,

• and the identification of future IVHM technological and regulatory needs

Deployment of ISHM in an aircraft and the resulting qualification process demands a huge investment. Verification and validation of these ISHM technologies is

an important step in building confidence, both qualitatively and quantitatively. Practically, the cost of correcting an error after fielding an ISHM system is dramatically greater then during the testing phase, thus highlighting the need for appropriate verification and validation techniques. Certification considerations must be addressed during the very early stages of technology development in order to successfully meet any significant qualification goals. Appropriate guidelines and strategies should be followed in ISHM technology development to ensure successful certification within the desired time frame. Additionally, trade studies in the selection of V&V platforms reduce the eventual cost of V&V processes. This paper focuses on development of such guidelines for the V&V process while emphasizing the relevance of ISHM simulation frameworks, and a well devised certification roadmap.

2. CERTIFICATION ASPECTS OF ISHM

2.1. Evolution of ISHM

Maintenance credits are acquired when an ISHM system can replace the existing industry standard maintenance for a given component or complete aircraft system and this enhances availability, maintainability and mission capabilities of aircraft. To reach this level, evolution of ISHM development has to pass through effective process for technology maturation, development, verification, validation, qualification and finally certification. Figure 1 illustrates the evolution phases of an ISHM system, which span maturation (concept refinement and technology development), development, production, installation, control introduction to service, benefit/credit validation, certification phases and continued airworthiness. The certification phases involve both the system developer and the regulator; they are initiated through an application made by the system developer to the appropriate regulatory authority; they are often performed in parallel to the various evolution phases.

Figure 1. Evolution of Aircraft Product including ISHM


173

Figure 2. Guidance for Technology Maturation & Development

2.2. Technology Maturation

After the determination of the potential functionality and benefits of ISHM, maturation efforts are initiated. Usually, the maturation phase starts before the development and certification phases, and can overlap them. The maturation efforts are often performed through Research and Development (R&D) programmes guided by technology and product roadmaps: efforts are allocated to develop sensing technologies, algorithms and software for ISHM, and to enhance the performance of ISHM in terms of increased accuracy, reduced weight, improved reliability, advanced communication and efficient data transfer. Technology gaps and risks are identified and efforts are allocated to fill the gaps and to mitigate the risks. During the maturation phase, the potential benefits and credits of ISHM are re-assessed and validation evidence is gathered. Efforts can also be allocated to develop and test ISHM prototypes, and to develop efficient production processes and reliable installation techniques. The Figure 2 defines the activities involved during technology maturation and development of ISHM system.

2.3. Development

The main development phases of a system, which can involve iterations through the following activities: determination of detailed system requirements, determination of the criticality levels and associated integrity requirements, system design, system test and evaluation, system integration, identification methodologies for credit validation, etc.

2.4. Guideline for V&V and Certification

Brian D Larder et al. depicted in form of flow diagram three important steps viz. installation, credit validation and Instructions for Continued Airworthiness (ICA) of HUMS certification as per FAA’s advisory circular. In the similar line, Praneet Menon et al. (2011) provided in terms of flow diagram detailed guidance for verification and validation of CBM Maintenance Credits. This paper attempts to combine the both concepts and depict very prominently how development process, V&V, certification and qualification are linked each other in terms of interdependency and phases of verification & validation maturity towards successful maintenance credit.

2.4.1. Certification for Installation

This consists of the following steps: • Check criticality versus integrity • Mitigating Actions • Airborne Equipment Installation • Ground base Equipment Installation • Credit plan approval

If any credit is to be gained, the general guidelines for determination of criticality levels will be either Minor, Major, or Hazardous/Severe-Major. They will be in agreement with the resulting effect of the end-to-end criticality assessment. A mitigating action is an autonomous and continuing compensating factor which may modify the level of qualification associated with certification of an ISHM application. These actions are often performed as part of


174


4

continued airworthiness considerations and are also an integral part of the certification. The overall installation considerations for airborne equipment should include, as a minimum, supply of electrical power, environmental conditions, system non-interference, and human factors if operations are affected along with considering environmental qualification (RTCA/DO-160/ED 14) and software development standard (RTCA/DO-178/ED-12). Since the ground based equipment may be an important part of the process for determination of intervention actions, its integrity and accuracy requirements must be the same as any other part of the ISHM process. The independent means of verification activity is required due to the use of COTS software. If the integrity assessment (IA) has mitigation spelled out for all possible functional failures of the algorithm, then one can proceed with the next V&V steps, i.e. establishing the V&V criteria and getting the V&V plan approved by the aviation authority. V&V criteria are driven by certification basis. The Certification Basis, summarised for ISHM in the Table 1, is the listing of all requirements from regulatory authorities or related advisory circulars which will ensure qualification of the system for airworthiness and to achieve maintenance credit in the context of ISHM. Generally certification basis is derived from Certification Specification (CS), Technical Standard order (TSO), along with the recent compliance recommendations (AMC,..), amendments and interpretations which are to be negotiated between certification coordinator (CC) and authority.

Table 1: Certification Basis for ISHM

2.4.2. V&V for Maintenance Credit

This can be done after the installation certification has been completed, however it is highly recommended to start this well before the installation certification is complete. Since the description of application and intended credit of the CBM process has already been defined it is now necessary to prove that the underlying physics of the monitored equipment and it's failures has been understood. The verification of the credit methodology is taken up. Upon completion of the verification steps, it is necessary to determine whether the verification criteria outlined in the plan have been met. If no, then the system element, i.e. the algorithm and corresponding configuration needs to be redesigned and re-verified. If yes, next step in the maintenance credit process is generation of production unit. It is to be noted that at this point the Air-Worthiness Report (AWR) has not yet been written for the credit methodology. The next step in the process is validation of the credit methodology. It needs to be determined whether the validation criteria outlined in the V&V plan have been met. If no, then the system element, i.e. the algorithm and corresponding configuration needs to be redesigned, re-verified and re-validated. If the validation was successful, then an AWR for the methodology can be written and the unit can be officially introduced into production. Once the system has been validated, a controlled introduction to service should be conducted, since there may still be some elements that can't be fully validated in the development phase. In this phase, data is collected from use in the actual aircraft, this data is then used to calibrate sensors and to tune and train the detection and prognosis algorithms. This basically means treating the maintenance credit as a maintenance benefit, only providing advisory activities for the time being. As soon as this phase has been completed, a full introduction to service can be performed (FAA’s advisory circular AC 29-2C MG-15).

2.4.3. Instruction for Continued Airworthiness

The final part of the certification process mainly focuses on training, documentation and operations of the CBM system. A plan is needed to ensure continued airworthiness of those parts that could change with time or usage and includes the methods used to ensure continued airworthiness. The applicant for ISHM is required to provide ICA developed in accordance with FAR/JAR Part 29 and Appendix A. This section provides supplemental guidance


175


5

with addressing aspects unique to HUMS (FAA’s advisory Circular AC 29-2C MG-15). Regulatory requirements for the “Instructions for Continued Airworthiness”, which must be written in English as a manual, contains: system description, installation, operation information, servicing information, system maintenance instructions including troubleshooting, methods of removal/replacement, access diagrams, etc.

2.5. V&V Roadmap

The Figure 3 depicts the V&V road map of ISHM with increasing Technology Readiness Level. On the basis of earlier discussion, V&V process towards airworthiness certification of ISHM will be spread over the following phases:

• Concept Refinement & Technology Development

• Development • Controlled Introduction to Service • Instruction for Continuous Airworthiness

V&V platforms or methods, which are mentioned in the second row of the figure corresponding to each phase, are summarized here.

• Concept Refinement & Technology Development o RCM Tools o Component Simulation o Component RIG o Formal Method for Analysis o Integrated Simulation Framework o Integrated Simulation Framework driven by

offline Flight Data o Integration Rig extended from Simulation

Framework o Hardwire-in-Loop Simulation

• Developmentt

o Ground System Deployment o Non-critical Flight System Deployment

• Controlled Introduction to Service

o Maturation of ISHM o Critical Flight System Deployment

• Instruction for Continuous Airworthiness

• In Service Validation – continued airworthiness

Note: On the basis of cost and impact analysis, applicability of formal method & HILS are decided.

Figure 3. V&V Roadmap with increasing TRL


176

3. ISHM SIMULATION FRAMEWORK IN V&V PROCESS

Cassidian develops a comprehensive integrated PC based simulation framework for integrated system health monitoring and management research and development. This ISHM framework is used primarily for demonstrating Proof of Enablers (PoE) and System Integration Laboratory (SIL) testing which is goal of concept refinement and technology development. User objective and metrics related to ISHM can be refined through Exhaustive Monte Carlo simulation of off-nominal seniors. Ground based ISHM systems can be deployed in this environment. This framework with high fidelity modelling of sub-systems and sensor data provides enough confidence in installation of on-board ISHM non-critical systems before controlled introduction to service for further tuning & refinement of algorithm. The integrated Simulation Framework is extendable enough to include offline stored flight data. In case of similar types of sub-systems already being flown in different aircraft, recorded sensor data could be made useful for more realistic validation of algorithm. Aircraft System models within the Simulation Framework are able to load, store off-line flight data and generate sensor data specific to sub-systems. In this mode, computation of physics based models is made disabled. Integrated HILS will have simulation of Aircraft Dynamics, Aircraft Subsystem H/W and adverse environmental effects. Also, there is the capability to inject system faults. This facility can expedite the validation process of ISHM and reduce validation time period during Controlled Introduction to Service. However this capability demands a huge investment of time and capital. These investments can be greatly reduced in case of V&V of aircraft’s ISHM by utilization of Simulation Framework. Integrated Simulation Framework can be integrated to individual test bed like SHM test rig. The conclusive evidence would be structural fault detection capabilities observed during the operation of the aircraft. The occurrences of structural faults such as cracks are infrequent, and hence, years of flight tests might be required to collect validation evidence; small number of flights would be only sufficient to prove the system “fitness for flight” and would be insufficient to prove “fitness for purpose”. Therefore, a validation approach would be required to extrapolate from laboratory tests to actual aircraft. Reference (HAHN Spring Limited. (2011)) has suggested that a generalisation and calibration approach would be required to extrapolate from laboratory specimens to actual aircraft; such an approach is expected to vary

between the different tasks and technologies of SHM systems. From the V&V roadmap, it is very much evident that different facilities are needed towards V&V, certification & qualification of ISHM technologies. Cassidian’s ISHM simulation Framework plays multi-role being as a single platform.

3.1. ISHM Simulation Framework

The goal of ISHM system are preparation of intelligent Maintenance Plan, intelligence Mission Plan and automatic logistic function for enhancing availability, maintainability and mission capabilities. These functions are achieved through Condition Based Maintenance (CBM). The Simulation Framework, which is built around OSA-CBM and OSA-EAI architecture, simulates all ISHM functional layers through different sub-system models

Prognostic Health Management (PHM) is the core of ISHM technology. Like in any other domain, challenges in the introduction of PHM systems in the aerospace domain are twofold. On the one hand, there are individual challenges in developing sensor technology, state detection and health assessment methodologies and models for determining the future life span of a (possibly deteriorated) component. On the other hand, there are integration challenges when turning heterogeneous data from disparate and distributed sources into consolidated information and dependable decision support on aircraft and fleet level. It has therefore been recognized in the community that standardized and open data management solutions are crucial to the success of PHM. Such a standard should introduce a commonly accepted framework for data representation, data communication and data storage. Key findings through the development of Cassidian’s ISHM Simulation Framework are:

• ISHM Simulation Framework plays vitals role in V&V process for ISHM.

• State-of-the-practice in using open architecture standards like OSA-CBM, OSA-EAI are not sufficient. This may require customisation or improvement in standards. These include standardizing non-XML-based transportation formats for OSA-CBM data packets for real-time operating condition, optimization of OSA-EAI database model for analytical tasks, etc.

• This provides a comprehensive RCM based CBM ground-base framework to realise and validate the full benefit of ISHM.


177


7

Figure 4. ISHM Simulation Framework

ISHM Simulation Framework simulates following modules:

• Aircraft System Model

• On-board ISHM System

• On-ground ISHM System

• Supply Chain (Enterprise Level)

• Simulation Management

Simulation of Aircraft system model and supply chain (Enterprise Level) create simulation environment for ISHM system models and simulation management controls the operation of complete ISHM Simulation Framework.

3.1.1. Aircraft System Model

Aircraft System Model simulates those systems and their sensors for which we intend to develop ISHM capabilities. Aircraft System Model have high fidelity modeling of Aircraft aerodynamics model, Hydraulics / Actuator System Model, Landing Gear, Fuel, ECS and Aircraft Structure, etc. Each sub-system implements physics based modeling of dynamic behavior, physics of fault, and computation of states or parameters for deriving senor data for each sub-system. Sensor data for each sub-system are generated from

computed states and parameters after corrupting with all possible errors that might occur in real-life scenario, as well as with noise specific to those sensors. All faults are injected from simulation control GUI. Any system for which ISHM specific monitoring and prediction capabilities should be validated and verified, needs to be modelled with a high level of detail. This should enable the realistic simulation of failures to support the validation of diagnostic and prognostic functions. Respective controller model simulates Built-in-Test (BIT) and Reactive Health Assessment (RHA) of the sub-system.

3.1.2. On-board ISHM

On-board ISHM function includes a central ISHM data processor. Sensors push their data to the IVHM data processor via an OSA-CBM implementation. The underlying message protocol is optimized for embedded systems. The ISHM data processor calculates ISHM information according to the OSA-CBM layer specifications, up to health assessment layer. As per OSA-CBM, there are seven functional layers. Central ISHM data processor has following functions:


178


8

Figure 5. Fault simulation concept for Simulation Framework

• First four functions of OSA-CBM o Data Acquisition o Data Manipulation o State Detection o Health Assessment

• High Level Reasoning • BIT Function • Storing of on-board health data

Several seeded fault tests under fixed conditions are sufficient to enable the model-based development of diagnostic functions. The development of prognostic functions (to be part of ground based ISHM) needs also to cover the development of suitable failure mode specific degradation models. Once the degradation models have been developed, it is possible to verify the diagnostic and prognostic functions through Monte-Carlo simulations. These simulations should include stochastic fault insertion for so-called "hard faults" (stochastically occurring failures without impacts on observable system parameters before the specified failure threshold is exceeded) and the usage of degradation models for "soft faults" (stochastically occurring degradations with impacts on observable system parameters before the specified failure threshold is exceeded). This concept is illustrated in Figure 5.

3.1.3. Ground based ISHM

Major functionalities towards enhancing availability, maintainability and mission capabilities related to ISHM

system are realized by ground base sub-systems. On-board ISHM function includes only data acquisition and diagnostic function of equipment health along with intermediate processing of data. Ground base ISHM system has significant amount of processing related to the following prime functions:

• On Ground Heath Management function • Operational Risk Assessment / Fleet High Level

Reasoning • Maintenance Management • Maintenance Planer • Resource / Logistic Management • Mission Planer • Learning Agent • Simulation of Enterprise System • Presentation Layer

Ground-base ISHM functionalities are enhanced from the core concept provided by Fatih Camci et al. (2006). On Ground Health Management function: On ground health management function consists of advance diagnostic, advance diagnostic and predictive analysis. Advance diagnostic validates further on-board diagnostic result with historical data of same aircraft and fleet wide fault data base and refine diagnostic decision. Advance prognostic computes RUL & Confidence for CBM candidate. Predictive Analysis (Trend analysis) identifies impending failure using trend analysis of historically collected data, but does not predict when failure will occur.


179


9

Maintenance Management: Maintenance Management functions finds one of the following maintenance solutions for a sub-system depending upon RCM process:

• Run-to-Fail

• Reactive

• Preventive (calendar based)

• Predictive

• CBM

Maintenance Management executes the following functions: • Identification of Maintenance task corresponding

to sub-system / functional failure • Rank of optimal maintenance task is computed as a

function of maintenance effectiveness for the failure mode, maintenance downtime and cost.

• Execute Maintenance (work order generation, Track Maintenance action, Receive feedback and close work order) as per approved maintenance plan

Maintenance Planer: Opportunistic Maintenance agent finds opportunistic maintenance time and task using rank of maintenance task, Mission capability of sub-system / function for future mission, RUL for future mission. Maintenance planner schedules the intelligent maintenance plan, validates with Resource Management Feedback and publishes maintenance plan after getting approval from decision support system. Resource / Logistic Management: This function tracks the availability along with configuration parameters of LRUs, tools, parts, consumables and personnel, etc. (configurable items). On the receipt of maintenance plan, Resource / Logistic management function sends feedback on validity of maintenance plan to Maintenance Planner on the basis of resource availability. This function finally generates a plan for resource / inventory and generates order for parts or LRUs to OEMs or suppliers as per present and projected status of inventory. Mission Planer: Mission Plans & Flying Programmes are entered using digital map and editing GUI. Mission planner instructs user to reschedule the Mission Plan if performance of aircraft exceeds as per mission plan entered and edited. Flying programs are asked to reschedule if approved maintenance plan superimposes with mission plan. Applicability of

mission segments of a particular aircraft is checked further with respect to operational capabilities of the aircraft for the segment, computed by Operational Risk Assessment (ORA). If capability of flight segment or complete mission is less than critical threshold, Mission Planner instructs user to reschedule or cancel the mission for particular Aircraft. Learning Agent: As experience is accumulated, some of the parameters within the model can be learned automatically by analyzing the feedback from the maintainer, OEM industry, Mission Commander, Resource Manager. The parameters to be learned are opportunistic maintenance threshold, required maintenance threshold, resource lead time, maintenance effectiveness and different co-efficient related to diagnostics & prognostics, etc

Simulation of Enterprise System: This module simulates supply of specific LRUs or parts from OEM, Service/Industry Support organization, Wholesale Stock point accounting appropriate accumulated delay attributed due to order process by resource management function, manufacturing (if applicable), shipping process, etc related to Supply Chain Management.

Presentation Layer: Decision support personal interacts through Presentation Layer which consists of following GUIs distributed across different terminals.

• Health Management & Monitoring

• Interactive GUI for Maintenance Management

• Resource Management & Monitoring

• Maintenance Planner

• Mission Planner

High Level Reasoning / Operational Risk Assessment: High Level Reasoning (HLR) is the capability that can estimate an airplane’s (or vehicle’s) functional availability. The purpose of HLR concept is used to estimate the functional availability of a vehicle based on the health assessment results from lower level systems and subsystems. Both concepts are part of the HLR development and integration into the simulation framework. RUL & confidence is recomputed for each component failure for all future missions and used by HLR. ORA finally determines and quantifies remaining functional / operational availability at the subsystem, vehicle levels and mission levels.


180


10

4. CONCLUSIONS

From above discussion, it is evident that nature of challenges in V&V and certification of ISHM is different compared to standard stand alone system. One of the major challenges in certification of ISHM system is due to non-availability of comprehensive regulatory standards for ISHM. V&V also poses challenges mainly due to the fact that ISHM has to handle a large number of off-nominal scenarios, has to ensure performance, safety, and reliability across the entire performance envelope and has to reliably avoid ‘false alarm’. Moreover, V&V has to deal with multidisciplinary aspects of ISHM. Most prominent aspect is direct evidence gathering for faults effects related to V&V of diagnostics and much more difficult for prognostics. To handle these issues, the key aspects of ISHM V&V mentioned above are summarized here: • V&V maturity starts from concept refinement and

technology development phase.

• If specific sub-system / function of ISHM, is classified as Hazardous/Severe Major, then direct evidence must be gathered. (FAA’s advisory circular AC 29-2C MG-15).

• If specific sub-system / function of ISHM, is classified

as Major or Lower, then indirect evidence is sufficient. (FAA’s advisory circular AC 29-2C MG-15).

• During ‘Controlled Introduction to Service’, CBM

maintenance credit is considered as maintenance benefit. i.e. CBM output is compared with maintenance instructions suggested by conventional RCM process.

• After maturation of algorithm and certification, CBM obtains maintenance credit.

• Appropriate sequence of V&V process of ISHM

function layers are to be considered.

• It must be noted that the V&V of ISHM functionalities in Simulation Framework do not completely address defects created by designer. It is evident from Figure 3 (V&V Roadmap with increasing TRL) that subsequent V&V phases (i.e. V&V in integration RIG, Integrated HILS, V&V during controlled introduction to the service and ICA) are suggested in order to achieve maintenance credit.

• Since ISHM simulation framework plays vital role in

V&V process, simulation framework has to be qualified (Robert G. Sargent. 1998).

The survey of works towards ISHM certification, suggested customization and experience in using simulation

framework for V&V provide impression that certification of ISHM is not impossible although it is not easy job. This study may give enough confidence to ISHM community towards achieving maintenance credit through implementation of this technology.

NOMENCLATURE

AC Advisory Circular AMC Acceptable Means of Compliance ARP Aerospace Recommended Practice AWR Airworthiness Report BIT Build-In Test CBM Condition Based Maintenance CC Certification Coordinator CS Certification Specification EAI Enterprise Application Integration FHA Functional Hazard Analysis FMECA Failure Modes, Effects, and Criticality Analysis GUI Graphical User Interface HILS Hardware in Loop Simulation HLR High Level Reasoning HUMS Heath Usage Monitoring System IA Integrity Assessment ICA Instruction for Continued Airworthiness ISHM Integrated System Health Monitoring IVHM Integrated Vehicle Heath Monitoring LRU Line Replaceable Unit OEM Original Equipment Manufacturer ORA Operational Risk Assessment OSA Open System Architecture PHM Prognostic Health Management RCM Reliability Centered Maintenance RUL Remaining Useful Life SHM Structural Health Monitoring TRL Technology Readiness Level TSO Technical Standard order

REFERENCES

A. Hess, G. Calvello, and T. Dabney. (2004). PHM a key enabler for the JSF autonomic logistics support concept. IEEE Aerospace Conference.

Brian D Larder, Mark W Davis. (2007). HUMS Condition Based Maintenance Credit Validation. American Helicopter Society 63rd Annual Forum, Virginia Beach, VA.

Dimitry Gorinevsky, Azary Smotrich, Robert Mah, Ashok Srivastava, Kirby Keller & Tim Felke. (2010). Open Architecture for Integrated Vehicle Health Management. AIAA Infotech @ Aerospace, Atlanta, GA.


181


11

FAA Advisory Circular 29-2C MG 15, Airworthiness Standards Transport Category Rotorcraft.

Fatih Camci, G. Scott Valentine, Kelly Navarra. (2006). Methodologies for Integration of PHM Systems with Maintenance Data. IEEEAC paper #1191, Version 1.

Gorinevsky, D., Smotrich, A., Mah, R., Srivastava A., Keller, K., & Felke, T. (2010). Open Architecture for Integrated Vehicle Health Management. AAIA Infotech@Aerospace Conference.

HAHN Spring Limited. (2011). Development, Validation, Qualification and Certification of Structural Health Monitoring Systems. HAHN Spring Report 1/B002.

James E. Dzakowic, G. Scott Valentine. (2004). Advanced Techniques for the verification and validation of prognostics & health management capabilities. Impact Technologies, LLC.

Kevin R. Wheeler, Tolga Kurtoglu, and Scott D. Poll. A (2010). Survey of Health Management User Objectives Related to Diagnostic and Prognostic Metrics.

Martin S. Feather, Lawrence Z. Markosian. (2005). Emerging Technologies for V&V of ISHM Software for Space Exploration. IEEE Aerospace Conference paper #1441, V-2.

O Benedettini, T S Baines, HWLightfoot, and RMGreenough,(2008). State-of-the-art in integrated vehicle health management.

Praneet Menon, Bob Robinson, Mike August, Terry Larchuk & Jack Zhao. (2011). Verification and Validation Process for CBM Maintenance Credits. American Helicopter Society 67th Annual Forum.

Robert G. Sargent. (1998). Verification and Validation of Simulation Model. Simulation Research Group, Syracuse University

SAE International, (2010), Aerospace Standards News Letter, Volume II, Issue 1.

US Army, ADS-79B-HDBK, (2011). Aeronautical Design Standard Handbook for Condition Based Management for US Army Aircraft.

BIOGRAPHIES

Matthias Buderath - Aeronautical Engineer with more than 25 years of experience in structural design, system engineering and product- and service support. Main expertise and competence is related to system integrity management, service solution architecture and integrated system health monitoring and management, Today he is head of technology development in CASSIDIAN. He is member of international Working Groups covering Through Life Cycle Management, Integrated System Health Management and Structural Health Management. He has published more the 50 papers in the field of Structural Health Management, Integrated Health Monitoring and Management, Structural Integrity Programme Management and Maintenance and Fleet Information Management Systems. Partha Pratim Adhikari - has more than thirteen years of experience in the field of Avionics and Aerospace Systems. Partha worked with RCI, DRDO; Aeronautical Development Agency (Ministry of Defence) and CAE Simulation Technology before joining Cassidian, CAIE, Banglore where he currently leads the Integrated System Health Monitoring (ISHM) program from Bangalore center. Partha has Bachelor’s degrees in Physics (H) and B. Tech in Opto-electronics from Calcutta University and a Master’s degree in Computer Science from Bengal Engineering and Science University. In his tenure across various aerospace organizations, Partha made significant contributions in the fields of Navigation systems, Avionics and Simulation technologies. Partha published several papers in the fields of estimation, signal processing and simulation of flight systems in national as well as international conferences and journals. Partha, in his current role as Tech Lead, Avionics at Cassidian, CAIE, Banglore is working on devising ISHM technologies for aviation systems with focus on complete vehicle health, robust implementation and certification of the developed technologies.


182

Theoretical and Experimental Evaluation of a Real-Time CorrosionMonitoring System for Measuring Pitting in Aircraft Structures

Douglas Brown1, Duane Darr2, Jefferey Morse3, and Bernard Laskowski4

1,2,3,4 Analatom, Inc., 562 E. Weddell Dr. Suite 4, Sunnyvale, CA 94089-2108, [email protected]@[email protected]

[email protected]

ABSTRACT

This paper presents the theory and experimental validation ofAnalatom’s Structural Health Management (SHM) system formonitoring corrosion. Corrosion measurements are acquiredusing a micro-sized Linear Polarization Resistance (µLPR)sensor. The µLPR sensor is based on conventional macro-sized Linear Polarization Resistance (LPR) sensors with theadditional benefit of a reduced form factor making it a viableand economical candidate for remote corrosion monitoring ofhigh value structures, such as buildings, bridges, or aircraft.

A series of experiments were conducted to evaluate the µLPRsensor for AA 7075-T6. Test coupons were placed along-side Analatom’s µLPR sensors in a series of accelerated tests.LPR measurements were sampled at a rate of once per minuteand converted to a corrosion rate using the algorithms pre-sented in this paper. At the end of the experiment, pit-depth due to corrosion was computed for each sensor fromthe recorded LPR measurements and compared to the aver-age pit-depth measured on the control coupons. The resultsdemonstrate the effectiveness of the sensor as an efficient andpractical approach to measuring pit-depth for AA 7075-T6.

1. INTRODUCTION

Recent studies have exposed the generally poor state of ournation’s critical infrastructure systems that has resulted fromwear and tear under excessive operational loads and environ-mental conditions. SHM (Structural Health Monitoring) Sys-tems aim at reducing the cost of maintaining high value struc-tures by moving from SBM (Scheduled Based Maintenance)to CBM (Condition Based Maintenance) schemes (Huston,2010). These systems must be low-cost, simple to installwith a user interface designed to be easy to operate. To re-

Douglas Brown et.al. This is an open-access article distributed under theterms of the Creative Commons Attribution 3.0 United States License, whichpermits unrestricted use, distribution, and reproduction in any medium, pro-vided the original author and source are credited.

Figure 1. Analatom AN101 SHM system installed in the rearfuel-bay bulkhead of a commercial aircraft.

duce the cost and complexity of such a system a generic in-terface node that uses low-powered wireless communicationshas been developed by Analatom. This node can communi-cate with a myriad of common sensors used in SHM. In thismanner a structure such as a bridge, aircraft or ship can befitted with sensors in any desired or designated location andformat without the need for communications and power linesthat are inherently expensive and complex to route. Data fromthese nodes is transmitted to a central communications Per-sonal Computer (PC) for data analysis. An example of this isprovided in Figure 1 showing Analatom’s AN101 SHM sys-tem installed in the rear fuel-bay bulkhead of a commercialaircraft.

A variety of methods such as electrical resistance,gravimetric-based mass loss, quartz crystal micro-balance-based mass loss, electrochemical, and solution analysis meth-ods enable the determination of corrosion rates of metals.The focus of this paper is on, Linear Polarization Resistance(LPR), a method based on electrochemical concepts to de-termine instantaneous interfacial reaction rates such as corro-sion rates and exchange current densities from a single exper-iment. There are a variety of methods capable of experimen-

1


183


tally determining instantaneous polarization resistances suchas potential step or sweep, current step or sweep, impedancespectroscopy, as well as statistical and spectral noise methods(Scully, 2000). The remainder of this paper will focus on theformer as the AN101 SHM system uses the potential step (orsweep) approach to measure LPR.

The remainder of the paper is organized by the following.Section 2 describes the general theory governing LPR. Sec-tion 3 presents Analatom’s µLPR discussing the benefitsof miniaturizing the sensor from a macro-scaled LPR. Sec-tion 4 outlines the experimental setup and procedure usedto validate the µLPR sensor. Section 5 presents the experi-mental measurements with the accompanying analysis whichdemonstrates the effectiveness of the µLPR sensor. Finally,the paper is concluded in Section 6 with a summary of thefindings and future work.

2. LPR THEORY

The corrosion of metals takes place when the metal dissolvesdue to oxidation and reduction (electrochemical) reactionsat the interface of metal and the (aqueous) electrolyte so-lution. Atmospheric water vapor is an example of an elec-trolyte that corrodes exposed metal surface and wet concreteis another example of an electrolyte that can cause corrosionof reinforced rods in bridges. Corrosion usually proceedsthrough a combination of electrochemical reactions; (1) an-odic (oxidation) reactions involving dissolution of metals inthe electrolyte and release of electrons, and (2) cathodic (re-duction) reactions involving gain of electrons by the elec-trolyte species like atmospheric oxygen O2, moisture H2O,or H+ ions in an acid (Bockris, Reddy, & Gambola-Aldeco,2000). The flow of electrons from the anodic reaction sites tothe cathodic reaction sites constitutes corrosion current andis used to estimate the corrosion rate. When the two reac-tions are in equilibrium at the equilibrium corrosion poten-tial, Ecorr, the net current on the metal surface is zero with-out an external source of current. The anodic reactions pro-ceed more rapidly at more positive potentials and the cathodicreactions proceed more rapidly at more negative potentials.Since the corrosion current from the unstable anodic and ca-thodic sites is too small to measure, an external activationpotential is applied across the metal surface and the currentis measured for electrochemical calculations. The resultingEa vs. Ia curve is called the polarization curve. Under exter-nal activation potential, the anodic and cathodic currents in-crease exponentially and so when log10 Ia is plotted againstEa (a Tafel Plot), the linear regions on the anodic and ca-thodic curves correspond to regions where either the anodicor cathodic reactions dominate and represent the rate of theelectrochemical process. The extrapolation of the Tafel linearregions to the corrosion potential gives the corrosion current,Icorr, which is then used to calculate the rate of corrosion(Burstein, 2005).

2.1. Anodic and Cathodic Reactions

Electrochemical technique of Linear Polarization Resistance(LPR) is used to study corrosion processes since the corrosionreactions are electrochemical reactions occurring on the metalsurface. Modern corrosion studies are based on the conceptof mixed potential theory postulated by Wagner and Traud,which states that the net corrosion reaction is the result oftwo or more partial electrochemical reactions that proceed in-dependently of each other (Wagner & Traud, 1938). For thecase of metallic corrosion in presence of an aqueous medium,the corrosion process can be written as,

M + zH2Of←→b

Mz+ +z

2H2 + zOH−, (1)

where z is the number of electrons lost per atom of the metal.This reaction is the result of an anodic (oxidation) reaction,

Mf←→b

Mz+ + ze−, (2)

and a cathodic (reduction) reaction,

zH2O + ze−f←→b

z

2H2 + zOH−. (3)

It is assumed that the anodic and cathodic reactions occurat a number of sites on a metal surface and that these siteschange in a dynamic statistical distribution with respect tolocation and time. Thus, during corrosion of a metal surface,metal ions are formed at anodic sites with the loss of electronsand these electrons are then consumed by water molecules toform hydrogen molecules. The interaction between the an-odic and cathodic sites as described on the basis of mixedpotential theory is represented by well-known relationshipsusing current (reaction rate) and potential (driving force). Forthe above pair of electrochemical reactions (anodic (2) andcathodic (3)), the relationship between the applied current Iaand potential Ea follows the Butler-Volmer equation,

Ia = Icorr

exp

[2.303 (Ea − Ecorr)

βa

]− . . .

exp

[−2.303 (Ea − Ecorr)

βc

], (4)

where βa and βc are the anodic and cathodic Tafel parametersgiven by the slopes of the polarization curves ∂Ea/∂ log10 Iain the anodic and cathodic Tafel regimes, respectively andEcorr is the corrosion potential (Bockris et al., 2000).

2.2. Electrode Configuration

An electrode is a (semi-)conductive solid that interfaces withan electrolytic solution. The most common electrode con-figuration is the three-electrode configuration. The commondesignations are: working, reference and counter electrodes.The working electrode is the designation for the electrode be-ing studied. In corrosion experiments, this is the material that

2


184


is corroding. The counter electrode is the electrode that com-pletes the current path. All electrochemistry experiments con-tain a working–counter pair. In most experiments the counterelectrode is simply the current source/sink comprised of in-ert materials like graphite or platinum. Finally, the referenceelectrode serves as an experimental reference point, specifi-cally for potential (sense) measurements. The reference elec-trode is positioned so that it measures a point very close tothe working electrode.

The three-electrode setup has a distinct experimental advan-tage over a two electrode setup: only one half of the cell ismeasured. That is, potential changes of the working elec-trode are measured independently of changes that may occurat the counter electrode. This configuration also reduces theeffect of measuring potential drops across the solution resis-tance when measuring between the working and counter elec-trodes.

2.3. Polarization Resistance

The corrosion current, Icorr, cannot be measured directly.However, a-priori knowledge of βa and βc along with a smallsignal analysis technique, known as polarization resistance,can be used to indirectly compute Icorr. The polarization re-sistance technique, also referred to as “linear polarization”,is an experimental electrochemical technique that estimatesthe small signal changes in Ia when Ea is perturbed byEcorr±10 mV (G102, 1994). The slope of the resulting curveover this range is the polarization resistance,

Rp , ∂Ea

∂Ia

∣∣∣∣|Ea−Ecorr|≤10 mV

. (5)

Note, the applied current, Ia, is the total applied current andis not multiplied by the electrode area so Rp as defined in (5)has units of Ω. Provided that |Ea − Ecorr| /βa ≤ 0.1 and|Ea − Ecorr| /βc ≤ 0.1, the first order Taylor series expan-sion exp (x) u 1 + x can be applied to (4) and (5) to arriveat,

Rp =1

2.303Icorr

(βaβcβa + βc

). (6)

Finally, this expression can be re-written for Icorr to arrive atthe Stern-Geary equation,

Icorr =B

Rp, (7)

where B = 12.303 [βaβc/ (βa + βc)] is a constant of propor-

tionality.

2.4. Pit Depth

The pit depth due to corrosion is calculated by computing thepitting current density, ipit,

ipit (t) =icorr − ipvNpit

, (8)

where icorr = Icorr/Asen is the corrosion current density,ipv is the passive current density, Npit is the pit density forthe alloy (derived empirically) and Asen is the effective sur-face area of the LPR sensor. One critical assumption is thepH is in the range of 6-8. If this cannot be assumed, then ameasurement of pH is required and ipassive is needed overthe range of pH values. Next, Faraday’s law is used to re-late the total pitting charge with respect to molar mass loss.Let the equivalent weight (EW ) represent the weight of themetal that reacts with 1 C of charge, thus contributing to thecorrosion and overall loss of material in the anodic (oxida-tion) reaction given in (2). The total pitting charge, Qcorr,and molar mass loss, M , can be related to the following,

Qpit (t) = zF ·M (t) , (9)where F = 9.650 × 104 C/mol is Faraday’s constant, and zis the number of electrons lost per atom in the metal in thereduction-oxidation reaction. The EW is calculated from theknown Atomic Weight (AW ) of the metal,

EW =AW

z. (10)

Next, the number of moles of the metal reacting can be con-verted to an equivalent mass loss, mloss,

mloss (t) = M (t) ·AW. (11)Combining (9) through (11), the mass loss mloss is related toQpit by,

mloss (t) =EW ·Qpit (t)

F. (12)

With the mass loss calculated and knowing the density ρ,the pit-depth modeled using a semi-spherical volume with adepth (or radius) d is expressed as,

d (t) =

(3mloss (t)

2πρ

)1/3

. (13)

Now, note that Qpit can be found by integrating ipit over thetotal time,

Qpit (t) =

ˆ t

0

ipit (τ) dτ, (14)

Substituting (12) and (14) into (13) gives,

d (t) =3

√3EW

2πρF

ˆ t

0

ipit (τ) dτ. (15)

Next, by substituting (7) and (8) into (15), the expression ford can be rewritten as,

d (t) = 3

√3EW

2πρNpitF

ˆ t

0

(B

AsenRp (τ)− ipv

)dτ. (16)

In practice, Rp is not measured continuously, rather, periodicmeasurements are taken every Ts seconds. If its assumed overthis interval the Rp values changes linearly then the meanvalue theorem for integrals can be applied to arrive at an al-ternative expression for d,

d (t) =3

√3TsEW

2πρNpitF

N−1

Σk=0

(B

AsenRp (kTs)− ipv

). (17)

3


185


2.5. Standard Measurements

Capacitive current can result in hysteresis in small amplitudecyclic voltammogram Ea vs. Ia plots. High capacitance,multiplied by a rapid voltage scan rate, causes a high capaci-tive current that results in hysteresis in cyclic Ea vs. Ia data(Scully, 2000). This affect can be reduced by making mea-surements at a slow scan rate. The maximum scan rate al-lowed to obtain accurate measurements has been addressedby Mansfield and Kendig (Mansfeld & Kendig, 1981). Themaximum applied frequency allowed to obtain the solutionresistance, Rs, and the polarization resistance, Rp, from aBode plot can be approximated by,

fmax < fbp u1

2πC (Rp +Rs), (18)

where fbp is an approximation of the lower break-point fre-quency, fmax is the maximum test frequency and C is thecapacitance that arises whenever an electrochemical interfaceexists between the electronic and ionic phases.

2.5.1. Linear Polarization Resistance (LPR)

ASTM standards D2776 and G59 describe standard proce-dures for conducting polarization resistance measurements.Potentiodynamic, potential step, and current-step methodscan be used to compute Rp (D2776, 1994; G59, 1994). Thepotentiodynamic sweep method is the most common methodfor acquiring Rp. For conventional macro-LPR measure-ments, a potentiodynamic sweep is conducted by applyingEa between Ecorr ± 10 mV at a slow scan rate, typically0.125 mV/s. A linear fit of the resulting Ea vs. Ia curve isused to computeRp. Performing this operation takes 160 sec-onds to complete.

3. µLPR CORROSION SENSOR

In this section, a micro-LPR (µLPR) is presented which usesthe potential step-sweep method to compute polarization re-sistance. The µLPR works on the same principle as themacro-sized LPR sensors and is designed to corrode at thesame rate as the structure on which it is placed. AlthoughLPR theory is well established and accepted as a viable cor-rosion monitoring technique, conventional macro-sized LPRsensor systems are expensive and highly intrusive. The µLPRis a micro-scaled LPR sensor inspired from the macro-sizedversion discussed in the previous section. Scaling the LPRsensor into a micro-sized package provides several advan-tages which include,

• Miniature form factor

• Two-pair electrode configuration

• Faster LPR measurements

(a)

(b)

Figure 2. Thin film µLPR sensor (a) exposed and (b) quasi-exposed with the lower-half underneath a coating.

3.1. Form Factor

Expertise in semiconductor manufacturing is used to micro-machine the µLPR. Using photolithography it is possible tomanufacture the µLPR sensor from a variety of standard engi-neering construction materials varying from steels for build-ings and bridges through to novel alloys for airframes. Themicro sensor is made up of two micro machined electrodesthat are interdigitated at 150µm spacing. The µLPR sensoris made from shim stock of the source/sample material thatis pressure and thermally bonded to Kapton tape. The shimis prepared using photolithographic techniques and ElectroChemical Etching (ECM). It is further machined on the Kap-ton to produce a highly ductile and mechanically robust microsensor that is very sensitive to corrosion. Images of the µLPRshown bare and a fitted sensor underneath a coating are shownin Figure 2.

3.2. Electrode Configuration

The µLPR differs from the conventional macro-sized LPRsensors in two major ways. First, the µLPR only consists oftwo electrodes. The need for the reference electrode is elim-inated as the separation distance between the working andcounter electrodes, typically 150µm, minimizes any voltagedrop due to the solution resistance, Rs. Second, both elec-trodes are composed of the same working metal. This isuncommon in most electrochemical cells where the counterelectrode is made of an inert material. The benefit is the elec-trodes provide a more direct measurement of corrosion thantechniques which use electrodes made of different metals (eg.gold). The sensor consists of multiple plates made from thematerial of interest which form the two electrodes. The elec-

4


186


trodes are used in conjunction with a potentiostat for conduct-ing LPR measurements. The use of a relatively large counterelectrode minimizes polarization effects at the counter elec-trode to ensure that a stable reference potential is maintainedthroughout the experiments.

3.3. LPR Measurements

Potential step-sweeps are performed by applying a series of30 steps over a range of ±10 mV spanning a period of 2.6 s.This allows eight µLPR sensors to be measured in less than30 s. However, the effective scan-rate of 7.7 mV/s generatesan additional current, Idl, due to rapid charging and discharg-ing of the capacitance, referred to as the double-layer capaci-tance Cdl, at the electrode-electrolyte interface,

Idl = CdldEa

dt. (19)

Let the resulting polarization resistance that is computedwhen Idl is non-zero be represented by Rp. It can be shownthat Rp is related to Rp by the following,

R−1p = R−1

p + Ydl, (20)

such that Ydl is defined by the admittance,

Ydl =

(Cdl

20 mV

)dEa

dt(21)

where dEa/dt is the scan rate. An example of this relation-ship is provided in Figure 3. In this example Cdl/20 mVandR−1

p correspond to the slope and y-intercept; these valueswere computed as 5.466×10−8 Ω−1·s/mV and 3.624×10−6 Ω,respectively. For a scan rate of dEa/dt = 7.7 mV/s, Ydl iscomputed as 4.209×10−7 Ω−1. Finally, for a given solution,Rp can be compensated by,

Rp =Rp

1− YdlRp

for YdlRp < 1. (22)

A plot of the actual LPR, Rp, vs. the measured LPR, Rp, fora µLPR sensor made from AA 7075-T6 with at a scan-rate of7.7 mV/s is provided in Figure 4(a). Note, as Rp decreases,the error between Rp and Rp also decreases, shown in Fig-ure 14(b). This is significant for the following reasons:

• Better accuracy is necessary for smaller values of Rp asthe corrosion rate increases with R−1

p

• When Rp is large, the corrosion rate approaches zero.Therefore, even as the error inRp increases substantially,the error in the corrosion rate becomes negligible.

• The corrosion rate computed using Rp will over-estimatethe actual corrosion rate computed from Rp.

Due to these reasons, and the fact that Analatom’s AN101 hasan upper limit of 5 MΩ for measuring Rp, no compensationis performed when computing corrosion rates.

0 2 4 6 8 103.6

3.7

3.8

3.9

4

4.1

4.2

Scan Rate [mV/s]

R−1

p[Ω

−1]·10−6

bC

bC

bCbC

bCbC

bC

MeasurementLinear Fit

bC

Figure 3. Plot of inverse polarization resistance vs. scan-ratefor a µLPR sensor made from AA 7075-T6 submersed in tapwater.

103 104 105 106103

104

105

106

107

Measured LPR, Rp [Ω]

ActualLPR,R

p[Ω

]

MappingIdeal

(a)

103 104 105 10610−2

10−1

100

101

102

Measured LPR, Rp [Ω]

PercentError

(b)

Figure 4. Plot of the (a) actual LPR, Rp, vs. the measuredLPR, Rp, and (b) corresponding measurement error for aµLPR sensor made from AA 7075-T6 with at a scan-rate of7.7 mV/s.

3.4. Maximum Scan Rate

The maximum measurement speed for conventional macro-sized LPR systems is restricted by the combination of resis-tance (solution / polarization) and capacitance at the elec-trochemical interface. From (18), fmax can be determinedgraphically by estimating fbp from a Bode plot. A bode plotfor the magnitude and phase response of a µLPR sensor con-structed from AA 7075-T6 submersed in distilled water is

5


187


10−1 100 101 102 103 104 105 106102

103

104

105

106

Frequency [Hz]

Magnitude[Ω

]

(a)

10−1 100 101 102 103 104 105 106−80

−70

−60

−50

−40

−30

−20

−10

0

Frequency [Hz]

Phase

[deg]

(b)

Figure 5. Bode plot showing the (a) magnitude and (b) phasefor a µLPR sensor constructed from AA 7075-T6 with dis-tilled water as the electrolyte.

shown in Figure 5. The data was generated using a poten-tiostat over the frequency range 0.1 Hz − 1 MHz. The mag-nitude response can be used to measure fbp > 100 Hz. Inpractice, the µLPR sensor applies a scan rate of 7.7 mV/s witha step-size of 0.67 mV between samples. This is equivalentto a sampling rate of 11.5 Hz which is a factor of ten less thanfbp.

4. EXPERIMENT

4.1. Setup

The experiment consisted of twenty-four (24) µLPR sensorsand twelve (12) control coupons. The coupons and µLPRsensors were made from AA 7075-T6. Each coupon wasplaced next to a pair of µLPR sensors. Each sensor washeld in place using a non-reactive polycarbonate clamp witha nylon fitting. All the sensors and coupons were mountedon an acrylic plexiglass base with the embedded hardwareplaced on the opposite side of the frame, shown in Figure 6.An electronic precision balance (Tree HRB-203) with a cali-brated range of 0 − 200 g (±0.001 g) was used to weigh thecoupons before and after the experiment. Finally, a weather-ing chamber (Q-Lab QUV/spray) promoted corrosion on thecoupons and µLPR sensors by applying a controlled streamof tap water for 10 seconds every five minutes.

(a)

(b)

Figure 6. Experimental setup showing (a) all 24 µLPR sen-sors, 12 coupons and three AN101 instrumentation boardsand (b) a close-up view of one of the panels used in the ex-periment. Note: only the first six coupons were used in theanalysis performed in this paper.

4.2. Procedure

First, the surface of each coupon was cleaned using sand-blasting. Then, each coupon was weighed using the analyti-cal balance. The entire panel of coupons and µLPR sensorswere placed in the weathering chamber for accelerated test-ing. The experiment ran for approximately 35 days. Dur-ing the experiment, a set of coupons were periodically re-moved from the weathering chamber. Throughout the ex-periment, Analatom’s embedded hardware was logging Rp

from each µLPR sensor. The sample rate was set at one sam-ple per minute. Once accelerated testing was finished, thecoupons were removed and the LPR data was downloadedand archived for analysis. The corrosion byproducts were re-moved from each coupon by applying micro-bead blastingto the coupon surface. Finally, the cleaned coupons wereweighted using the analytical scale to compute the relativecorrosion depth during the experiment.

5. RESULTS

5.1. Coupon Corrosion

The corrosion byproducts were carefully removed usingmicro-bead blasting. The pitting depth, d, of each coupon

6


188


(a)

(b)

Figure 7. Image of the three AA 7075-T6 coupons (ID 2.01,2.03 and 2.04) after approximately 17 days of corrosion test-ing showing (a) the condition of the coupons before cleaningand (b) after cleaning using micro-bead blasting.

was calculated using the formula,

d = 3

√3mloss

2πρNpitAexp, (23)

where values for the mass loss mloss, exposed surface areaAexp, resulting pit depth, d, and total time of exposure of eachcoupon is provided in Table 1. Values for the pitting densityand ρ were set at Npit = 10 cm−2 and ρ = 2.810 g/cm3, re-spectively. The pitting density was computed by counting theaverage number of pits over the surface for coupons 2.06 and2.08. The measurement uncertainty in the pit-depth due touncertainty in the mass loss, ∆mloss and pit density, ∆Npit,is approximately,

∆d ≈ d

3

(∆mloss

mloss

+∆Npit

Npit

), (24)

where ∆mloss = ±0.001 g is the minimum resolution of thescale and ∆Npit = ±3 cm−2 was the standard deviation ofthe measured pit density over 1 cm2 sample areas for coupons2.06 and 2.08.

5.2. µLPR Corrosion

The linear polarization resistance measurements were used tocompute corrosion pit depth for each µLPR sensor. The com-puted pit-depth for each of the 24 µLPR sensors is providedin Figure 8.

6. SUMMARY

A micro-sized LPR (µLPR) sensor was presented for cor-rosion monitoring in Structural Health Management (SHM)applications. An experimental test was performed to com-

0 5 10 15 20 25 30 350

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Time [days]

Pit

dep

th[m

m]

b

bbb

bbbb

Sensor Avg.Measurementb

(a)

0 5 10 15 20 25 30 350

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Time [days]

Averagepit

dep

th[m

m]

b

bbb

bbbb

Sensor Avg.Measurementb

(b)

Figure 8. Comparison of the measured and computed pitdepth over a period of approximately 35 days for (a) eachµLPR sensor and (b) the average of all µLPR sensors.

pare corrosion measurements from twenty-four µLPR sen-sors with twelve coupons. Both the coupons and sensors wereconstructed from the same material, AA 7075-T6. Accord-ing to the results, the pit-depth measured on the coupons fellwithin the 95% confidence interval computed from the pit-depth measured on the µLPR sensors. The results indicatemultiple µLPR can be used to provide an accurate measure-ment of corrosion. Future work includes testing additionalalloys such as AA 7075-T6 and performing in-flight testingon a C-130 legacy aircraft.

ACKNOWLEDGMENT

All funding and development of the sensors and systems inthe project has been part of the US government’s SBIR pro-grams. In particular: 1) In preparing the initial system designand development, funding was provided by the US Air Forceunder SBIR Phase II contract # F33615-01-C-5612 monitoredby Dr. James Mazza, 2) Funding for the development and ex-perimental set-up was provided by the US Navy under SBIRPhase II contract # N68335-06-C-0317 monitored by Dr. PaulKulowitch, and 3) for further improvements and scheduledfield installations by the US Air Force under SBIR Phase IIcontract # FA8501-11-C-0012 monitored by Mr. FeraidoonZahiri.

7


189


Table 1. Experimental measurements of coupon corrosion.

Coupon ID Time Exposed [min] Area[cm2

]Initial Mass [g] Final Mass [g] Mass Loss [g] Pit Depth [mm]

Control 0 5.801× 101 7.6870× 101 7.6869× 101 1× 10−3 N/A

2.01 2.1198× 104 5.805× 101 7.7253× 101 7.7215× 101 3.8× 10−2 2.162× 10−1

2.02 1.1160× 104 5.798× 101 7.6842× 101 7.6818× 101 2.4× 10−2 1.855× 10−1

2.03 2.1198× 104 5.799× 101 7.6927× 101 7.6896× 101 3.1× 10−2 2.020× 10−1

2.04 2.1198× 104 5.805× 101 7.6897× 101 7.6869× 101 2.8× 10−2 1.953× 10−1

2.06 3.8510× 104 5.798× 101 7.6884× 101 7.6828× 101 5.6× 10−2 2.461× 10−1

2.08 3.8510× 104 5.803× 101 7.6921× 101 7.6810× 101 5.4× 10−2 2.431× 10−1

NOMENCLATURE

βa Anodic Tafel slope V/decβc Cathodic Tafel slope V/decρ Density g/mm3

d Corrosion depth cmk LPR sample index –fbp Break-point frequency Hzfmax Maximum test frequency Hzicorr Corrosion current density A/cm2

ipit Pitting current density A/cm2

ipv Passive current density A/cm2

mloss Mass loss due to corrosion gz Number of electrons lost per atom –∆d Corrosion depth uncertainty cm∆mloss Mass loss uncertainty g∆Npit Pit density uncertainty cm−2

Asen Effective sensor area cm2

Aexp Exposed coupon area cm2

AW Atomic Weight g/mol

B Proportionality constant V/decCdl Double-layer capacitance FEa Applied potential VEcorr Corrosion voltage VEW Equivalent weight g/mol

F Faraday’s constant C/mol

Ia Applied current AIdl Scanning current from Cdl AIcorr Corrosion current AM Number of moles reacting molN Total number of µLPR samples –Npit Pit density cm−2

Qcorr Charge from oxidation reaction CRp Polarization resistance Ω

Rp Measured polarization resistance Ω

Rs Solution resistance ΩTs Sampling period sYdl Scanning admittance from Cdl s

REFERENCES

Bockris, J. O., Reddy, A. K. N., & Gambola-Aldeco, M.(2000). Modern Electrochemistry 2A. Fundamentalsof Electrodics (2nd ed.). New York: Kluwer Aca-demic/Plenum Publishers.

Burstein, G. T. (2005, December). A Century of Tafel’s Equa-tion: 1905-2005. Corrosion Science, 47(12), 2858-2870.

D2776, A. S. (1994). Test Methods for Corrosivity of Waterin the Absence of Heat Transfer. Annual Book of ASTMStandards, 03.02.

G102, A. S. (1994). Standard Practice for Calculation ofCorrosion Rates and Related Information from Electro-chemical Measurements. Annual Book of ASTM Stan-dards, 03.02.

G59, A. S. (1994). Standard Practice for Conducting Po-tentiodynamic Polarization Resistance Measurements.Annual Book of ASTM Standards, 03.02.

Huston, D. (2010). Structural Sensing, Health Mon-itoring, and Performance Evaluation (B. Jones &W. B. S. J. Jnr., Eds.). Taylor and Francis.

Mansfeld, F., & Kendig, M. (1981).Corrosion, 37, 545.

Scully, J. R. (2000, February). Polarization ResistanceMethod for Determination of Instantaneous CorrosionRates. Corrosion, 56(2), 199-218.

Wagner, C., & Traud, W. (1938).Elektrochem, 44, 391.

Douglas W. Brown is the senior systems engineer atAnalatom with eight years of experience developing and ma-turing PHM and fault-tolerant control systems in avionics ap-plications. He received the B.S. degree in electrical engi-neering from the Rochester Institute of Technology in 2006and the M.S/Ph.D. degrees in electrical engineering from theGeorgia Institute of Technology in 2008 and 2011, respec-tively. Dr. Brown is a recipient of the National DefenseScience and Engineering Graduate Fellowship and has re-

8


190


ceived several best-paper awards in his work in prognosticsand fault-tolerant control.

Duane Darr is the senior embedded hardware engineer atAnalatom with over 30 years of experience in the softwareand firmware engineering fields. He completed his under-graduate work in physics, and graduate work in electrical en-gineering and computer science at Santa Clara University.Mr. Darr’s previous work at Epson Imaging TechnologyCenter, San Jose, California, as Senior Software Engineer,Data Technology Corporation, San Jose, California as Se-nior Firmware Engineer, and Qume Inc., San Jose Califor-nia, as Member of Engineering Staff/Senior Firmware Engi-neer, focused on generation and refinement of software andfirmware solutions for imaging core technologies as well asdigital servo controller research, development, and commer-cialization.

Jefferey Morse is the director of advanced technology atAnalatom since 2007. Prior to this, he was a senior scientistin the Center for Micro and Nano Technology at LawrenceLivermore National Laboratory. He received the B.S. and

M.S. degrees in electrical engineering from the University ofMassachusetts Amherst in 1983 and 1985, respectively, anda Ph.D. in electrical engineering from Stanford University in1992. Dr. Morse has over 40 publications, including 12 jour-nal papers, and 15 patents in the areas of advanced materi-als, nanofabrication, sensors and energy conversion technolo-gies. He has managed numerous projects in various multidis-ciplinary technical areas, including electrochemical sensorsand power sources, vacuum devices, and microfluidic sys-tems.

Bernard Laskowski is the president and senior research sci-entist at Analatom since 1981. He received the Licentiaatand Ph.D. degrees in Physics from the University of Brus-sels in 1969 and 1974, respectively. Dr. Laskowski has pub-lished over 30 papers in international refereed journals in thefields of micro physics and micro chemistry. As presidentof Analatom, Dr. Laskowski managed 93 university, govern-ment, and private industry contracts, receiving a U.S. SmallBusiness Administration Administrator’s Award for Excel-lence.

9


191

Uncertainty of performance requirements for IVHM tools accordingto business targets

Manuel Esperon-Miguez1, Philip John2, and Ian K. Jennions3

1,3 IVHM Centre, Cranfield, Bedfordshire, MK43 0FQ, United Kingdom

[email protected]@cranfield.ac.uk

2 Cranfield University, Cranfield, Bedfordshire, MK43 0AL, United Kingdom

[email protected]

ABSTRACT

Operators and maintainers are faced with the task ofselecting which health monitoring tools are to be acquired ordeveloped in order to increase the availability and reduceoperational costs of a vehicle. Since these decisions willaffect the strength of the business case, choices must bebased on a cost benefit analysis. The methodology presentedhere takes advantage of the historical maintenance dataavailable for legacy platforms to determine the performancerequirements for diagnostic and prognostic tools to achievea certain reduction in maintenance costs and time. Theeffect of these tools on the maintenance process is studiedusing Event Tree Analysis, from which the equations arederived. However, many of the parameters included in theformulas are not constant and tend to vary randomly arounda mean value (e.g.: shipping costs of parts, repair times),introducing uncertainties in the results. As a consequencethe equations are modified to take into account the varianceof all variables. Additionally, the reliability of theinformation generated using diagnostic and prognostic toolscan be affected by multiple characteristics of the fault,which are never exactly the same, meaning the performanceof these tools might not be constant either. To tackle thisissue, formulas to determine the acceptable variance in theperformance of a health monitoring tool are derived underthe assumption that the variables considered followGaussian distributions. An example of the application ofthis methodology using synthetic data is included.

1. INTRODUCTION

The objective of Integrated Vehicle Health Management(IVHM) is to increase platform availability and reducemaintenance costs through the use of health monitoring on

key systems. The information generated using conditionmonitoring algorithms can be used to reduce maintenancetimes, improve the management of the support process andoperate the fleet more efficiently. Although IVHM caninclude the use of tools to improve the management oflogistics, maintenance and operations (Khalak & Tierno,2006), this methodology focuses on diagnostic andprognostic tools.

In order to run the algorithms it is necessary to read a set ofparameters with a given accuracy and enough resolution togenerate trustworthy information for the maintainer.Additionally, the data generated by sensors has to betransmitted, postprocessed, stored and analyzed. Although itis possible to carry out part of this process off-board, legacyvehicles rarely have the sensors, data buses, memory orcomputer power still required on-board. However, legacyplatforms are expensive to modify to accommodate newhardware, especially if the modifications have to becertified. Therefore, it is not always possible to use the besthardware available for every tool and its performance willnot reach its full potential. Furthermore, the implementationof the new health monitoring tools must have the lowestimpact possible on the normal operation of the fleet, aproblem not found in vehicles which are still being designedor manufactured. Thus, health monitoring tools for legacyplatforms have a lower performance, a higher cost and ashorter payback period than if they were used on newvehicles.

On the other hand, the historical maintenance data generatedby fleets provide information that can be used to select thecomponents to retrofit health monitoring tools on, validatediagnostic and prognostic algorithms, and carry out Cost-Benefit Analyses (CBA). This is an important advantagesince the expectations regarding the performance of the tooland their impact on the operational costs and availability aremuch more accurate for legacy platforms. Additionally,FMECAs, which are widely used for the design of health

_____________________Esperon-Miguez et al. This is an open-access article distributed under theterms of the Creative Commons Attribution 3.0 United States License,which permits unrestricted use, distribution, and reproduction in anymedium, provided the original author and source are credited.


192


2

monitoring tools and perform CBAs (Banks, Reichard,Crow and Nickell, 2009; Kacprzynski, Roemer, and Hess,2002; Ashby & Byer, 2002) become easier to populate andmore precise. Even the experience of maintenance personneland operators on qualitative aspects has a huge value for thedevelopment of IVHM tools.

This information can be used to define the performancerequirements of any diagnostic or prognostic tool. Since themain objective of retrofitting IVHM is the reduction ofmaintenance cost and time, these are the constraints used inthe methodology presented here. Teams in charge ofdeveloping health monitoring algorithms need to know notonly the performance expected from their tools, but also thebudget constraints to make them profitable. This data can beused to calculate the performance expected from adiagnostic or prognostic tool if it is to achieve a certainreduction of the cost and downtime associated with themaintenance of component it monitors. It is important tonote that the criticalities of different costs and maintenanceoperations vary for each stakeholder (Wheeler, Kurtogluand Poll, 2009) and depend on whether the vehicle isoperated in a civilian or a military environment (Williams,2006).

In some cases it is possible to generate mathematicalexpressions to relate the return on investment with certaindesign parameters (Kacprzynski et al., 2002; Hoyle, Mehr,Turner, and Chen, 2007; Banks & Merenich, 2007), but thisapproach restricts major changes in the design and theequations are not applicable to other monitoring systems.

Working with historical maintenance data involves usingaverage values of many recorded parameters which arereally random variables. Therefore, there is a certain degreeof uncertainty in any calculation of the performancerequirements which must be taken into account to avoidarriving at overconfident results. Furthermore, the reliabilityof an IVHM tool varies depending on the characteristics ofthe fault, which are different on every occasion, and thistranslates into uncertainty about its performance (Lopez &Sarigul-Klijn, 2010). As a result, the acceptable standarddeviations of the performance parameters of each tool haveto be calculated to ensure the targets are met.

2. PERFORMANCE OF IVHM TOOLS

IVHM is enabled by the use of sensors to gather data of acomponent and those systems that interact with it in order todetect malfunctions – diagnostic tools – or to predict thefailure of the part – prognostic tools. Diagnostic tools helpto identify the component responsible for the malfunction ofa system, reducing the diagnosis and localization times.Additionally, they can prevent the vehicle to continuerunning with an unnoticed fault.

If a diagnostic tool is too sensitive it can trigger false alarmswhich could result in unnecessary checks, waste of

resources and, in some cases, aborting the mission. On theother hand, if the sensitivity is too low and faults are notdetected, the investment on the tool will not produce anybenefits. Therefore, the main performance parameters of adiagnostic tool in an analysis of its effect on maintenancecost and time are the probability of triggering a false alarm,PFA, and the probability of producing a false negative, PFN.

Prognostic tools calculate the RUL of a component at agiven moment providing maintainers with a lead time toaccommodate the replacement or repair of that part in thefuture. If the lead time is long and accurate enough, themaintenance of the component can be carried out along withother scheduled tasks (long-term prognosis). Otherwise, thepart will have to be replaced between missions (short-termprognosis), but this approach is still safer, cheaper and lesstime-consuming than running the component until failure.While long-term prognostic tools enable the deferral of themaintenance action until the next scheduled service, short-term prognostic tools can affect the availability of thevehicle if the time available for maintenance betweenmissions is shorter that the time necessary to repair the fault.

The performance of a prognostic tool is determined by thereliability of the information it provides and how it is used,in other words, by the probability of the component failingbefore it was planned to be replaced (PLP for long-term toolsand PSP for short-term tools). As shown in Figure 1, it isnecessary to define a maximum admissible probability offailure, Pmax, to determine how long the component canremain in service, tmax. This requires choosing a degradationcurve from those generated by the prognostic tool fromwhich tmax is estimated. The probability of the componentfailing is a function of the average life of the componentsremoved, tm, which depends on the period betweenscheduled services (long-term tools) or the mean timebetween missions (short-term tools).

Figure 1. Degradation curves generated by a prognostic toolused to estimate the probability of failure of a component

before it has been replaced.


193


3

3. EVENT TREE ANALYSIS

The failure of a component has a different cost and repairtime depending on whether an IVHM tools has performedits function correctly or not. This can be studied using EventTree Analysis (ETA) where the probability of the failure ofthe component, PF, is the triggering event and each toolintroduces a fork in the diagram as shown in Figure 2. Acorrect prognosis prevents the need for a diagnosis and, if itis incorrect, a diagnostic tool can still be used. For the samereason long-term prognostic tools are further to the left onthe diagram than short-term tools. It is important to remarkthat this is not a representation of the way the algorithmswork, but how the performance of each tool leads todifferent outcomes.

In case a component presents different failure modes thatneed to be monitored by different tools, costs anddowntimes need to be estimated independently for eachmode. This is not a problem since most algorithms fordiagnostic and prognostic tools track specific failure modes.

The tree shows six possible outcomes or maintenancescenarios, including the lack of need to replace a healthycomponent. Maintenance cost and time are calculated foreach scenario according to how the use (or malfunction) of ahealth monitoring tool affects maintenance process. In casea prognostic tool is used, it is necessary to take into accountfactors such as the reduction of the delays, the value of theRUL of the component, the lower operational for costs onscheduled operations, and the avoidance of secondaryfailures. The use of diagnostic tools can help to reduce themaintenance time as well as the use of resources andpersonnel since searching for the cause of the malfunction isno longer necessary. However, false alarms, or falsepositives, can lead to unnecessary checks or even theremoval of healthy components which could be disposed of(Trichy, Sandborn, Raghavan and Sahasrabudhe, 2001).Techniques necessary to calculate some of these parameterswere described by Leao, Fitzgibbon, Puttini and de Melo(2008) as well as Prabhakar and Sandborn (2010.)

Since the event tree can be used to calculate the probabilityof each outcome, the resulting total maintenance cost, C,and time, T, can be calculated using the followingexpressions:

ி ௌ ௌ

ௌ ிே ிே ிே ி ி ி (1)

ி ௌ ௌ

ௌ ிே ிே ிே ி ி ி(2)

These polynomial functions can be used to calculate thesensitivities of the maintenance cost and time to theperformance of health monitoring tools. Additionally, it

must be noted that the data used to calculate the cost anddowntime of each scenario are not constant and vary aroundaverage values (e.g.: time to repair or shipping costs), andthese equations can be used as the basis to calculate thestandard deviation of the resulting maintenance costs andtimes.

Detectability with IVHM

Cost TimeLong TermPrognosis

Short TermPrognosis

Diagnosis

1-PLP CLP tLPSUCCESS

PF 1-PSPCSP tSPPLP SUCCESS

FAILURE 1-PFNCD tD

PSP SUCCESS

FAILURE PFN CFN tFNFAILURE

1-PFA0 0

1-PF SUCCESS

PFA CFA tFAFAILURE

Figure 2. ETA for the use of health monitoring tools on asingle component.

4. PERFORMANCE REQUIREMENTS WITH EXACT DATA

The performance of an IVHM tool must guarantee that themaintenance cost and time associated with the component itmonitors are below C* and T* respectively.

Prognostic tools can be used to monitor a system whichalready has some diagnostic capability in order to combinethe benefits from estimating its RUL and being able toidentify the source of a malfunction if the component failsbefore it was expected. However, it is difficult to imaginedeveloping a diagnostic algorithm for a part which is nolonger run until failure thanks to the use of prognostics.Therefore, the equations for the probability of false negativeand false alarm only take into consideration the parametersof scenarios in which diagnostic tools are used.

∗ி ிே ிே ிே ி ி ி

(3)

∗ி ிே ிே ிே ி ி ி

(4)

ி ிே(5;6)

ி

∗ி ிே ிே ிே

ி ி

(7)

ி


ி ி

(8)


194


4

Equations (5-8) define a space which encloses all thepossible solutions that comply with the requirements. Thisspace can be represented as sown in Figure 3.

The following expressions can be used to determine theprobability of failure of a long-term prognostic tool giventime and cost constraints. The equations for short-term toolare obtained the same way.


ி ி ி(9)


ி ி ி(10)

(11)

∗ி ி ி

ி

ிே ிே ிே

(12)

∗ி ி ி

ி

ிே ிே ிே

(13)

Since the system is overdetermined the most stringentsolution must be selected.

5. UNCERTAINTY

Most parameters used to perform a CBA are not constantsince the conditions under which each job is carried out aredifferent. Costs of personnel and parts can changedepending on the location or the shift. Active maintenancetimes, delays and the time dedicated to the diagnosis andlocalization of a fault are never exactly the same.Consequently, the variables used to define a maintenanceactivity are approximated to average values. This also

affects the frequency of failure of the component, which isapproximated to the Mean Time Between Failures (MTBF)for most quantitative analyses despite being extremelyvariable for those components that can benefit the mostfrom IVHM. Additionally, the performance of healthmonitoring tools over a fixed period can also vary,increasing the uncertainty of the cost and downtimecalculated in the previous sections.

Although the total maintenance time dedicated to a singlecomponent can be broken down into several steps includingdelays, repair time and checkout time (British Standard,1991), they tend to be poorly recorded. Since the wholeprocess involves different teams, it is difficult to keep trackof the exact amount of time dedicated to each component(especially for delays and diagnosis). In addition,technicians tend to focus on the task in hand and registerapproximate values once the job is finished.

Therefore, there are uncertainties associated with the resultsfrom a CBA and this affects the definition of theperformance requirements for IVHM tools. To avoidoverstating the benefits from using diagnostic andprognostic tools it is necessary to include the standarddeviation of every parameter that does not remain constant.It is also necessary to determine the acceptable standarddeviation for the performance of the algorithms to ensurethe maintenance costs and times will remain belowacceptable levels.

Taking into account the effects of uncertainties means thatfor every performance parameter aforementioned anadditional variable has to be calculated. At the same time, itis necessary to define the probability of the maintenancecost and downtime being bellow the limits imposed; in otherwords: how confident we are that the costs and times willremain below limits. As a consequence, two additionalconstraints are introduced: confidence to comply with costrequirements, RC; and confidence to comply with timerequirements, RT.

The maintenance costs and times of different scenarios canbe considered independent since numerous factors includedin their calculation are random and uncorrelated. Theseassumptions allows for analytical expression to beformulated using the standard deviation of such randomfactors. In order to simplify mathematical operationsvariance is used instead of standard deviation. Therefore,the following properties apply:

(14)

ଶ ଶ (15)

Since the variations in costs and maintenance times are dueto numerous random factors, it has been assumed that boththe total maintenance time and total maintenance cost percomponent follow Gaussian distributions.

Figure 3. Region of acceptable performance of adiagnostic tool

PFN

PFA

Cost constraints

Time Constraints

Region of possiblesolutions


195


5

Diagnostic tools are now defined by four parameters:probability of false alarm, PFA; probability of false negative,PFN; and their variances, Var(PFA) and Var(PFN)respectively. The limits of these variables are defined by thefollowing functions:

∗

(16)

∗

(17)

ி ிே (18)

Where

ிே ி ிே ி ி ி ி (19)

ிே ி ிே ி ி ி ி (20)

ிே ி ிே ி

ி ி ி

(21)

ிே ி ிே ி

ி ி ி

(22)

From equation (16)

∗ ଶ

ଵ

ଶ

(23)

Additionally

ଵ ிே ଶ ி ଷ (24)

where

ଵ ிଶ

ிே

ଶ

ி ிே (25)

ଶ ி

ଶ

ிଶ

ி ி (26)

ଷ ிேଶ

ி ிே

ிଶ

ி ி ி (27)

As a result

ଵ ிே ଶ ி

∗ிே ி ிே ி ி ி ி

ଶ

ଵ

ଶ ଷ

(28)

Following the same steps for the maintenance timerequirements from equation (17), the second condition is

ସ ிே ହ ி

∗ிே ி ிே ி ி ி ி

ଶ

ଵ

ଶ

(29)

where

ସ ிଶ

ிே ଶ

ி ிே (30)

ହ ி

ଶ

ிଶ

ி ி (31)

ிேଶ

ி ிே ிଶ

ி ி

ி (32)

Therefore, any diagnostic tool that satisfies the requirementsand can generate the projected savings with the expectedaccuracy must comply with equations (18), (28), and (29).

Prognostic tools are now defined by the probability of thecomponent failing before it is replaced and its variance. Thefollowing formulas define the constraints for a prognostictool to comply with the cost and support requirements. Tokeep the equations manageable, the parameters of diagnostictools are not included. In case they were necessary the fullequations can be obtained in a similar manner. As fordiagnostic tools:

∗

(33)

∗

(34)

The difference being

(35)

ி ிே ி (36)

ி ிே ி (37)

ி ிே ி (38)

ி ிே ி (39)

From equation (33)

∗ ଶ

ଵ

ଶ

(40)

Combining equations (37), (38) and (40)

ி ிே

∗ி ிே ி

ଶ

ଵ

ଶ ி (41)

Using the properties described in equations (14) and (15)and following the same steps with the equations formaintenance time constraints the results are:


196

∗ி ிே ி

ଶ

ଵ

ଶ ி ଶ

ி ிே

ிଶ

ிே

ଶ

ி ிே

(42)

∗ி ிே ி

ଶ

ଵ

ଶ ி ଶ

ி ிே

ிଶ

ிே ଶ

ி ிே

(43)

These parabolas define the limits for the performancerequirements of any prognostic tool as shown in Figure 4.These expressions are for long-term prognostic tools. Toobtain the formulas for short term tools replace CLP and tLP

by CST and tLP respectively.

These formulas can be applied to any component of avehicle to quantify the performance requirements forcontinuous monitoring tools. These requirements will bethen communicated to the internal teams in charge ofdeveloping IVHM tools, the supplier of the component,independent developers of health monitoring technology oreven can be used to call an open tender. Since theperformance parameters are determined based on economicobjectives, it is possible to calculate the maximumacceptable cost for each tool based on the remaining usefullife of the fleet.

Additionally, this set of equations presents a framework toinclude risk analysis on a CBA and strengthen the businesscase for installing IVHM on the aircraft.

6. CASE STUDY

The following example is based on synthetic data for ageneric component that fails every 250 flying hours.Although the values chosen for the parameters used in thiscase do not belong to a specific real component, they arerepresentative of the costs and maintenance times of many

parts currently run until failure. All the factors taken intoaccount to calculate the maintenance cost and time of eachscenario, as well as their values, are listed in Table 1.Standards deviations were chosen to ensure the uncertaintieswould vary between ±5% and ±20% (assuming allparameters follow Gaussian distributions so 99.7% of theoutcomes are within ±3σ from the mean). The results foreach scenario are shown in Figure 5.

The objective is to reduce the maintenance costs per flyinghour for this component by 15% and the maintenance timeby 40%. These goals must be met with, at least, 95%confidence. As a result the performance requirements forlong and short term prognostic tools are shown in Figure 6.

Since the performance of diagnostic tools is described byfour variables it is not possible to represent the limits of therequirements. To provide some guidance, the graphs fordiagnostic tools shown in Figure 6c represent the relationbetween the probability of false alarm and the probability offalse negative, assuming there is no uncertainty about theperformance of the tool (i.e.: zero variance). To check if theperformance of a given tool complies with the requirementsit is necessary to use the equations previously shown.

Detectability with IVHMCost (£) Time (h)L-T

PrognosisS-T

PrognosisDiagnosis

1-PLP 773.5[2.95E+02]

1.35[9.00E-04]S

PF 1-PSP 906.1[1.88E+02]

1.35[9.00E-04]PLP S

F 1-PFN 1021.7[1.86E+02]

1.35[3.16E-03]PSP S

F PFN 1319.825[3.10E+02]

3.375[6.46E-03]F

1-PFA 0 01-PF S

PFA 330[3.03E+01]

2[2.27E-03]F

Total5.279

[6.82E-02]0.0135

[5.17E-07]

Figure 5. Costs, times and their variances (in brackets) foreach maintenance scenario.

PLP

Var(PLP)

Cost constraints

Time Constraints

Region of possible solutions

Figure 4. Region of acceptable performance andvariance of performance of a long-term prognostic tool


197

a)

b)

c)

Figure 6. Graphs for possible solutions for a) long-term and b) short term prognostic tools and c) diagnostic tools.

0

0.02

0.04

0.06

0.08

0.1

0.12

0.140

0.0

8

0.1

6

0.2

4

0.3

2

0.4

0.4

8

0.5

6

0.6

4

0.7

2

0.8

0.8

8

0.9

6

Var

(PLP

)

PLP

Constraints for LT prognostic tools

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0 0.2 0.4 0.6 0.8 1

Var

(PLP

)

PLP

Var(PLT) (cost) Var (PLT) (time)

0

0.01

0.02

0.03

0.04

0.05

0.06

0

0.0

8

0.1

6

0.2

4

0.3

2

0.4

0.4

8

0.5

6

0.6

4

0.7

2

0.8

0.8

8

0.9

6

Var

(PSP

)

PSP

Constraints for ST prognostic tools

00.010.020.030.040.050.060.070.080.09

0.1

0 0.2 0.4 0.6 0.8 1

Var

(PSP

)

PSP

Var(PLT) (cost) Var (PLT) (time)

0.0E+0

5.0E-5

1.0E-4

1.5E-4

2.0E-4

2.5E-4

0

0.0

8

0.1

6

0.2

4

0.3

2

0.4

0.4

8

0.5

6

0.6

4

0.7

2

0.8

0.8

8

0.9

6

PFA

PFN

Constraints for diagnostic tools(no varaiance of PFA or PFN)

0.0E+0

2.0E-4

4.0E-4

6.0E-4

8.0E-4

1.0E-3

1.2E-3

0 0.2 0.4 0.6 0.8 1

PFA

PFN

PFA (time) PFA (cost)


198

a) b)

Figure 7. PDF of maintenance a) cost and b) time for the different IVHM tools proposed.

The probability density functions (PDFs) of the newmaintenance cost and time are calculated and compared tothe targets to verify if a diagnostic tool with a givenperformance is capable of achieving the necessaryimprovements. Figure 7 shows the PDF for three possibleIVHM tools (one of each kind) that reach the targetscompared to the original distributions. It also illustrates howchanging the probabilities of different maintenancescenarios, with different variances, affects the standarddeviation of the final maintenance cost and time, which canbe reduced (diagnostic tool) or increased (long termprognostic tool.)

Only the shaded area on left side of the graphs comprisesthose tools that achieve the expected reduction in cost anddowntime. The area on the right is for those which matchthe requirements with a confidence complimentary to whatis expected (i.e.: 5%) as illustrated in Figure 8.

The requirements for diagnostic and short term prognostictools illustrate an interesting phenomenon: in some casesone of the targets can result in any possible solutionoverperforming in other areas. In this example a diagnostictool that barely reaches the expected cost reduction willimprove maintenance times by much more than it isrequired. The opposite happens to short term prognostictools.

PF 0.004Cost ofcomponent (£)

Scheduled M. 525Unscheduled M. 628.9False Alarm 65

Cost of Labor(£)

Scheduled M. 90Unscheduled M. 132.5

Value of RUL(£)

Long Term Prog 68.5Short Term Prog 12.2

Other costs(£)

Compensation 0Secondary damage 127.8Flight Test 0Loss Income 0

Warranty Parts (%) 0Labor (%) 0

Time (h) MTTR 2Check-out 0.25MTTD 2Localization 0.25Technical delay 0.33Administrative delay 1Logistic delay 0

Table 1. List of parameters used in case study and theirvalues.

0.00

0.50

1.00

1.50

2.00

2.50

0.00 2.00 4.00 6.00

Maintenance cost per flying hour (£/h)

PDF of maintenance cost

Current Cost (pdf) Cost w LTP (pdf)

LIMIT Cost w STP (pdf)

Cost w Diag (pdf)

0.000

200.000

400.000

600.000

800.000

1000.000

1200.000

1400.000

0.000 0.005 0.010 0.015 0.020

Maintenance time per flyig hour (h/h)

PDF of maintenance time

Current Downtime (pdf) Downtime w LTP (pdf)

LIMIT Downtime w STP (pdf)

Downtime w Diag (pdf)


199


9

7. CONCLUSIONS

This methodology represents a reliable way to define therequirements of individual tools based on the expectationsof improving the maintenance of specific components andthe uncertainty of the available data. Since the equationsallow to carry out a quantitative risk analysis, business casesthat use this methodology are more robust and less likely tooverstate the benefits of installing the selected combinationof IVHM tools.

It is not always possible to obtain reliable data to determinethe standard deviation or variance of some of the variablesused to calculate the costs or maintenance times. In somecases these variables are poorly recorded or not recorded atall. To tackle this problem, personnel with experiencemaintaining the aircraft should be interviewed to getapproximated values. This will always be a better optionthan ignoring the effect of these uncertainties.

Quantifying the uncertainty of the expected revenue iscritical to estimate the present value of an investment onIVHM technology given its long return period. For thatpurpose, techniques like real options can be combined withthe methodology presented here.

IVHM tools can affect the uncertainty, or standarddeviation, of the resulting maintenance costs and timessignificantly, either reducing it or increasing it. Since thepredictability of these factors is sometime as important asdecreasing their value, this effect must be analyzed carefullyin a CBA.

Further work is necessary to study how the diagnoses andprognoses from several algorithms interact. If this newinformation enables grouping maintenance activities the

total downtime can be reduced, increasing the availability ofthe vehicle and generating additional savings.

ACKNOWLEDGEMENT

This work has been supported by the IVHM Centre atCranfield University. The authors also want to thank thepartners of the IVHM Centre for their support in thisproject.

NOMENCLATURE

C Maintenance cost of component per flying hourC* Target cost per flying hourCD Maintenance cost of an effective automated

diagnosisCFA Maintenance cost of a false alarmCFN Maintenance cost of a false negativeCLP Maintenance cost of an effective long term

prognosisCSP Maintenance cost of an effective short term

prognosisPF Probability of failure of the component per flying

hourPFA Probability of false alarmPFN Probability of false negativePLP Probability of long term prognosis being

ineffectivePSP Probability of short term prognosis being

ineffectiveRC Expected confidence to comply with cost

requirementsRT Expected confidence to comply with time

requirementsT Maintenance time of component per flying hourT* Target maintenance time per flying hourtD Maintenance time of an effective automated

diagnosistFA Maintenance time of a false alarmtFN Maintenance time of a false negativetLP Maintenance time of an effective long term

prognosistm Average life of components replaced following the

indication of a prognostic tooltmax Maximum time a component is run before its

probability of failure reaches a predetermined limittSP Maintenance time of an effective short term

prognosis

REFERENCES

Ashby, M. J. and Byer, R. J. (2002), "An approach forconducting a cost benefit analysis of aircraft engineprognostics and health management functions",Aerospace Conference Proceedings, 2002. IEEE, Vol.6, pp. 6-2847.

Figure 8. Region of acceptable performance and varianceof performance of a long-term prognostic tool

P

Var(P)

1

A1-A


200

Ban

Banks, J., Reichard, K., Crow, E. and Nickell, K. (2009),

British Standard, (

Hoyle, C., Mehr

Kacprzynski, G. J., Roemer, M. J. and Hess, A. J. (2002),

Khalak, A. and Tierno, J. (2006), "Influence of prognostic

Leao, B.

Lopez, I. and Sarigul

Prabhakar, V. J. and Sandborn, P. (2010), "A part total cost

Trichy, T., Sandborn, P., Raghavan, R. and Sahasrabudhe,

Wheeler, K., Kurtoglu, T. and Poll, S. (2009), "A Survey of

Williams, Z. (2006), "Benefits of I

B

Banks, J. and Merenich, J. (2007), "Cost Benefit Analysisfor Asset Health Management Technology",and Maintainability Symposium, 2007. RAMS '07.Annual, pp. 95.

Banks, J., Reichard, K., Crow, E. and Nickell, K. (2009),"How engineers can conductPHM systems",Magazine, IEEE,

British Standard, (vocabularymaintainability terms

Hoyle, C., MehrQuantifying CostSystems",

Kacprzynski, G. J., Roemer, M. J. and Hess, A. J. (2002),"Health management system design: Development,simulation aConference Proceedings, 2002. IEEE,3065.

Khalak, A. and Tierno, J. (2006), "Influence of prognostichealth management on logistic supply chain",Control Conference, 2006,

Leao, B. P., Fitzgibbon, K. T., Puttini, L. C. and de Melo,G. P. B. (2008), "Costfor PHM Applied to Legacy Commercial Aircraft",Aerospace Conference, 2008 IEEE,

Lopez, I. and Sariguluncertaintmonitoring, diagnosis and control: Challenges andopportunities",46, no. 7, pp. 247

Prabhakar, V. J. and Sandborn, P. (2010), "A part total costof ownership model for long lsystems",Manufacturing,

Trichy, T., Sandborn, P., Raghavan, R. and Sahasrabudhe,S. (2001), "A new test/diagnosis/rework model for usein technical cost modeling of electronicassembly",International,

Wheeler, K., Kurtoglu, T. and Poll, S. (2009), "A Survey ofHealth Management User Objectives Related toDiagnostic and Prognostic Metrics"

Williams, Z. (2006), "Benefits of Iapproach",pp.

BIOGRAPHIES

ks, J. and Merenich, J. (2007), "Cost Benefit Analysisfor Asset Health Management Technology",and Maintainability Symposium, 2007. RAMS '07.

pp. 95.Banks, J., Reichard, K., Crow, E. and Nickell, K. (2009),

"How engineers can conductPHM systems", Aerospace and Electronic SystemsMagazine, IEEE, vol. 24, no. 3, pp. 22

British Standard, (1991),vocabulary - Part 3 Availability, reliability andmaintainability terms

Hoyle, C., Mehr, A., Turner, I. and Chen, W. (2007), "OnQuantifying Cost-Benefit of ISHM in AerospaceSystems", Aerospace Conference, 2007 IEEE,

Kacprzynski, G. J., Roemer, M. J. and Hess, A. J. (2002),"Health management system design: Development,simulation and cost/benefit optimization",Conference Proceedings, 2002. IEEE,

Khalak, A. and Tierno, J. (2006), "Influence of prognostichealth management on logistic supply chain",Control Conference, 2006,

P., Fitzgibbon, K. T., Puttini, L. C. and de Melo,G. P. B. (2008), "Costfor PHM Applied to Legacy Commercial Aircraft",Aerospace Conference, 2008 IEEE,

Lopez, I. and Sarigul-Klijn, N. (2010), "A review ofuncertainty in flight vehicle structural damagemonitoring, diagnosis and control: Challenges andopportunities", Progress in Aerospace Sciences,46, no. 7, pp. 247-273.

Prabhakar, V. J. and Sandborn, P. (2010), "A part total costof ownership model for long lsystems", International Journal of Computer IntegratedManufacturing, , pp. 1-

Trichy, T., Sandborn, P., Raghavan, R. and Sahasrabudhe,S. (2001), "A new test/diagnosis/rework model for usein technical cost modeling of electronicassembly", Test Conference, 2001. Proceedings.International, pp. 1108.

Wheeler, K., Kurtoglu, T. and Poll, S. (2009), "A Survey ofHealth Management User Objectives Related to

stic and Prognostic Metrics"Williams, Z. (2006), "Benefits of I

approach", Aerospace Conference, 2006 IEEE,

IOGRAPHIES

Manuel Esperonresearchinglegacy platforms at Cranfield IVHMCentre since 2010.worked on R&D fordevices and their implementation on landvehicles. He holds a Master in Mechanical

European Conference of Prognostics and Health Management

ks, J. and Merenich, J. (2007), "Cost Benefit Analysisfor Asset Health Management Technology",and Maintainability Symposium, 2007. RAMS '07.

Banks, J., Reichard, K., Crow, E. and Nickell, K. (2009),"How engineers can conduct cost-benefit analysis for

Aerospace and Electronic Systemsvol. 24, no. 3, pp. 221991), BS 4778-

Part 3 Availability, reliability and

, A., Turner, I. and Chen, W. (2007), "OnBenefit of ISHM in Aerospace

Aerospace Conference, 2007 IEEE,Kacprzynski, G. J., Roemer, M. J. and Hess, A. J. (2002),

"Health management system design: Development,nd cost/benefit optimization",

Conference Proceedings, 2002. IEEE,

Khalak, A. and Tierno, J. (2006), "Influence of prognostichealth management on logistic supply chain",Control Conference, 2006, pp. 6 pp.

P., Fitzgibbon, K. T., Puttini, L. C. and de Melo,G. P. B. (2008), "Cost-Benefit Analysis Methodologyfor PHM Applied to Legacy Commercial Aircraft",Aerospace Conference, 2008 IEEE, pp. 1.

Klijn, N. (2010), "A review ofy in flight vehicle structural damage

monitoring, diagnosis and control: Challenges andProgress in Aerospace Sciences,

273.Prabhakar, V. J. and Sandborn, P. (2010), "A part total cost

of ownership model for long life cycle electronicInternational Journal of Computer Integrated

-14.Trichy, T., Sandborn, P., Raghavan, R. and Sahasrabudhe,

S. (2001), "A new test/diagnosis/rework model for usein technical cost modeling of electronic

Test Conference, 2001. Proceedings.pp. 1108.


stic and Prognostic Metrics"Williams, Z. (2006), "Benefits of IVHM: an analytical

Aerospace Conference, 2006 IEEE,

Manuel Esperon-Miguezresearching on retrofitting IVHM tools onlegacy platforms at Cranfield IVHMCentre since 2010.worked on R&D for highdevices and their implementation on land

He holds a Master in Mechanical


ks, J. and Merenich, J. (2007), "Cost Benefit Analysisfor Asset Health Management Technology", Reliabilityand Maintainability Symposium, 2007. RAMS '07.

Banks, J., Reichard, K., Crow, E. and Nickell, K. (2009),benefit analysis for

Aerospace and Electronic Systemsvol. 24, no. 3, pp. 22-30.

-3.1:1991 QualityPart 3 Availability, reliability and


Aerospace Conference, 2007 IEEE, pp. 1.Kacprzynski, G. J., Roemer, M. J. and Hess, A. J. (2002),

"Health management system design: Development,nd cost/benefit optimization", Aerospace

Conference Proceedings, 2002. IEEE, Vol. 6, pp. 6

Khalak, A. and Tierno, J. (2006), "Influence of prognostichealth management on logistic supply chain", American

pp. 6 pp.P., Fitzgibbon, K. T., Puttini, L. C. and de Melo,

Benefit Analysis Methodologyfor PHM Applied to Legacy Commercial Aircraft",

pp. 1.Klijn, N. (2010), "A review of

y in flight vehicle structural damagemonitoring, diagnosis and control: Challenges and

Progress in Aerospace Sciences,

Prabhakar, V. J. and Sandborn, P. (2010), "A part total costife cycle electronic

International Journal of Computer Integrated

Trichy, T., Sandborn, P., Raghavan, R. and Sahasrabudhe,S. (2001), "A new test/diagnosis/rework model for usein technical cost modeling of electronic systems

Test Conference, 2001. Proceedings.


VHM: an analyticalAerospace Conference, 2006 IEEE,

Miguez has beenon retrofitting IVHM tools on

legacy platforms at Cranfield IVHMCentre since 2010. Manuel has also

high energy storagedevices and their implementation on land



ks, J. and Merenich, J. (2007), "Cost Benefit AnalysisReliability

and Maintainability Symposium, 2007. RAMS '07.

Banks, J., Reichard, K., Crow, E. and Nickell, K. (2009),benefit analysis for

Aerospace and Electronic Systems

3.1:1991 QualityPart 3 Availability, reliability and


pp. 1.Kacprzynski, G. J., Roemer, M. J. and Hess, A. J. (2002),

"Health management system design: Development,Aerospace

Vol. 6, pp. 6-

Khalak, A. and Tierno, J. (2006), "Influence of prognosticAmerican

P., Fitzgibbon, K. T., Puttini, L. C. and de Melo,Benefit Analysis Methodology

for PHM Applied to Legacy Commercial Aircraft",

Klijn, N. (2010), "A review ofy in flight vehicle structural damage

monitoring, diagnosis and control: Challenges andProgress in Aerospace Sciences, vol.

Prabhakar, V. J. and Sandborn, P. (2010), "A part total costife cycle electronic

International Journal of Computer Integrated

Trichy, T., Sandborn, P., Raghavan, R. and Sahasrabudhe,S. (2001), "A new test/diagnosis/rework model for use

systemsTest Conference, 2001. Proceedings.


VHM: an analyticalAerospace Conference, 2006 IEEE, pp. 9

has beenon retrofitting IVHM tools on

legacy platforms at Cranfield IVHMManuel has also

ergy storagedevices and their implementation on land


Engineeringand an MSc in Aerospace Engineering from BrunelUniversity, UK. He is currently pursuing a PhD in IVHM atCranfield University

systems engineering and management roles, including Headof Systems Engineering for a major multinational company.His experience and responsibilities in industry encompassedthe whole scope of systems engineeringRequirements Engineering, System Design, ILS, ARM,Human Factors, Safety, Systems Proving & Simulation andModelling. He is a member of several National AdvisoryCommittees and Industrial Steering Boards and served asthe President of the InterEngineering (INCOSE) in the UK from 2003 to 2004. Hisresearch interests include: Understanding Complex Systemsand Systems of Systems (SoS); Managing ComplexSystems Projects and Risks; Through Life CapabilityManagement; and CSystems

technical roles, gaining experience in aerodynamics, heattransfer, fluid systems, mechanical design, combustion,services and IVHM. He moved to Cranfield in July 2008 asProfessor and DThe Centre is funded by a number of industrial companies,including Boeing, BAe Systems, RollsMeggitt, MOD and Alstom Transport. He has led thedevelopment and growth of the Centre, in research andeducation, over the last three years. The Centre offers ashort course in IVHM and the world’s first IVHM MSc,begun in 2011.

Ian is on the editorial Board for the International Journal ofCondition Monitoring, a Director of the PHM Society,contributingHMASME. He is the editor of the recent SAE book: IVHMPerspectives on an Emerging Field.


Engineering from Madrid Polytechnic Universityand an MSc in Aerospace Engineering from BrunelUniversity, UK. He is currently pursuing a PhD in IVHM atCranfield University

systems engineering and management roles, including Headof Systems Engineering for a major multinational company.His experience and responsibilities in industry encompassedthe whole scope of systems engineeringRequirements Engineering, System Design, ILS, ARM,Human Factors, Safety, Systems Proving & Simulation andModelling. He is a member of several National AdvisoryCommittees and Industrial Steering Boards and served asthe President of the InterEngineering (INCOSE) in the UK from 2003 to 2004. Hisresearch interests include: Understanding Complex Systemsand Systems of Systems (SoS); Managing ComplexSystems Projects and Risks; Through Life CapabilityManagement; and CSystems

Ianyears, working mostly for a variety of gasturbine companies. He has a MechanicalEngineering degree and a PhD in CFD bothfrom Imperial College, London. He hasworkeElectric and Alstom in a number of

technical roles, gaining experience in aerodynamics, heattransfer, fluid systems, mechanical design, combustion,services and IVHM. He moved to Cranfield in July 2008 asProfessor and Director of the newly formed IVHM Centre.The Centre is funded by a number of industrial companies,including Boeing, BAe Systems, RollsMeggitt, MOD and Alstom Transport. He has led thedevelopment and growth of the Centre, in research andducation, over the last three years. The Centre offers a

short course in IVHM and the world’s first IVHM MSc,begun in 2011.

Ian is on the editorial Board for the International Journal ofCondition Monitoring, a Director of the PHM Society,contributing member of the SAE IVHM Steering Group andHM-1 IVHM committee, a Fellow of IMechE, RAeS andASME. He is the editor of the recent SAE book: IVHMPerspectives on an Emerging Field.


from Madrid Polytechnic Universityand an MSc in Aerospace Engineering from BrunelUniversity, UK. He is currently pursuing a PhD in IVHM atCranfield University.

Philip JohnEngineering at Cranfield University inthe UK and has been the University'sProfessor of Systems Engijoining in 1999.Imperial College, London he spent 18years in industry, holding a wide range of

systems engineering and management roles, including Headof Systems Engineering for a major multinational company.His experience and responsibilities in industry encompassedthe whole scope of systems engineeringRequirements Engineering, System Design, ILS, ARM,Human Factors, Safety, Systems Proving & Simulation andModelling. He is a member of several National AdvisoryCommittees and Industrial Steering Boards and served asthe President of the International Council on SystemsEngineering (INCOSE) in the UK from 2003 to 2004. Hisresearch interests include: Understanding Complex Systemsand Systems of Systems (SoS); Managing ComplexSystems Projects and Risks; Through Life CapabilityManagement; and Coping with Uncertainty and Change in

Ian K. Jennionsyears, working mostly for a variety of gasturbine companies. He has a MechanicalEngineering degree and a PhD in CFD bothfrom Imperial College, London. He hasworked for RollsElectric and Alstom in a number of

technical roles, gaining experience in aerodynamics, heattransfer, fluid systems, mechanical design, combustion,services and IVHM. He moved to Cranfield in July 2008 as

irector of the newly formed IVHM Centre.The Centre is funded by a number of industrial companies,including Boeing, BAe Systems, RollsMeggitt, MOD and Alstom Transport. He has led thedevelopment and growth of the Centre, in research andducation, over the last three years. The Centre offers a


Ian is on the editorial Board for the International Journal ofCondition Monitoring, a Director of the PHM Society,

member of the SAE IVHM Steering Group and1 IVHM committee, a Fellow of IMechE, RAeS and

ASME. He is the editor of the recent SAE book: IVHMPerspectives on an Emerging Field.

y 2012

from Madrid Polytechnic Universityand an MSc in Aerospace Engineering from BrunelUniversity, UK. He is currently pursuing a PhD in IVHM at

Philip John is the Head of the School ofEngineering at Cranfield University inthe UK and has been the University'sProfessor of Systems Engijoining in 1999. Following his PhD atImperial College, London he spent 18years in industry, holding a wide range of

systems engineering and management roles, including Headof Systems Engineering for a major multinational company.His experience and responsibilities in industry encompassedthe whole scope of systems engineeringRequirements Engineering, System Design, ILS, ARM,Human Factors, Safety, Systems Proving & Simulation andModelling. He is a member of several National AdvisoryCommittees and Industrial Steering Boards and served as

national Council on SystemsEngineering (INCOSE) in the UK from 2003 to 2004. Hisresearch interests include: Understanding Complex Systemsand Systems of Systems (SoS); Managing ComplexSystems Projects and Risks; Through Life Capability

oping with Uncertainty and Change in

Jennions. Ian’s career spans over 30years, working mostly for a variety of gasturbine companies. He has a MechanicalEngineering degree and a PhD in CFD bothfrom Imperial College, London. He has

d for Rolls-Royce (twice), GeneralElectric and Alstom in a number of


irector of the newly formed IVHM Centre.The Centre is funded by a number of industrial companies,including Boeing, BAe Systems, Rolls-Meggitt, MOD and Alstom Transport. He has led thedevelopment and growth of the Centre, in research andducation, over the last three years. The Centre offers a




ASME. He is the editor of the recent SAE book: IVHMPerspectives on an Emerging Field.

from Madrid Polytechnic University, Spain,and an MSc in Aerospace Engineering from BrunelUniversity, UK. He is currently pursuing a PhD in IVHM at

is the Head of the School ofEngineering at Cranfield University inthe UK and has been the University'sProfessor of Systems Engineering since

Following his PhD atImperial College, London he spent 18years in industry, holding a wide range of

systems engineering and management roles, including Headof Systems Engineering for a major multinational company.His experience and responsibilities in industry encompassedthe whole scope of systems engineering, includingRequirements Engineering, System Design, ILS, ARM,Human Factors, Safety, Systems Proving & Simulation andModelling. He is a member of several National AdvisoryCommittees and Industrial Steering Boards and served as



career spans over 30years, working mostly for a variety of gasturbine companies. He has a MechanicalEngineering degree and a PhD in CFD bothfrom Imperial College, London. He has

Royce (twice), GeneralElectric and Alstom in a number of


irector of the newly formed IVHM Centre.The Centre is funded by a number of industrial companies,

-Royce, Thales,Meggitt, MOD and Alstom Transport. He has led thedevelopment and growth of the Centre, in research andducation, over the last three years. The Centre offers a




ASME. He is the editor of the recent SAE book: IVHM

10

, Spain,and an MSc in Aerospace Engineering from BrunelUniversity, UK. He is currently pursuing a PhD in IVHM at

is the Head of the School ofEngineering at Cranfield University inthe UK and has been the University's

neering sinceFollowing his PhD at

Imperial College, London he spent 18years in industry, holding a wide range of

systems engineering and management roles, including Headof Systems Engineering for a major multinational company.His experience and responsibilities in industry encompassed

, includingRequirements Engineering, System Design, ILS, ARM,Human Factors, Safety, Systems Proving & Simulation andModelling. He is a member of several National AdvisoryCommittees and Industrial Steering Boards and served as



career spans over 30years, working mostly for a variety of gasturbine companies. He has a MechanicalEngineering degree and a PhD in CFD bothfrom Imperial College, London. He has

Royce (twice), GeneralElectric and Alstom in a number of


irector of the newly formed IVHM Centre.The Centre is funded by a number of industrial companies,

Royce, Thales,Meggitt, MOD and Alstom Transport. He has led thedevelopment and growth of the Centre, in research andducation, over the last three years. The Centre offers a




ASME. He is the editor of the recent SAE book: IVHM –


201

Unscented Kalman Filter with Gaussian Process Degradation Modelfor Bearing Fault Prognosis

Christoph Anger B.Sc.1, Dipl.-Ing. Robert Schrader1, and Prof. Dr.-Ing. Uwe Klingauf1

1 Institute of Flight Systems and Automatic Control, Darmstadt, 64287, [email protected]

[email protected]@fsr.tu-darmstadt.de

ABSTRACT

The degradation of rolling-element bearings is mainlystochastic due to unforeseeable influences like short termoverstraining, which hampers the prediction of the remain-ing useful lifetime. This stochastic behaviour is hardly de-scribable with parametric degradation models, as it has beendone in the past. Therefore, the two prognostic concepts pre-sented and examined in this paper introduce a nonparametricapproach by the application of a dynamic Gaussian Process(GP). The GP offers the opportunity to reproduce a damagecourse according to a set of training data and thereby also esti-mates the uncertainties of this approach by means of the GP’scovariance. The training data is generated by a stochasticdegradation model that simulates the aforementioned highlystochastic degradation of a bearing fault. For prediction andstate estimation of the feature, the trained dynamic GP iscombined with the Unscented Kalman Filter (UKF) and eval-uated in the context of a case study. Since this prognostic ap-proach has shown drawbacks during the evaluation, a multi-ple model approach based on GP-UKF is introduced and eval-uated. It is shown that this combination offers an increasedprognostic performance for bearing fault prediction.

1. INTRODUCTION

Forecasting the remaining useful lifetime (RUL) of line-replaceable units (LRUs) with a high accuracy is one of themain issues in aviation to avoid unnecessary maintenance cy-cles and, therefore, to reduce aircraft life cycle costs. Onecomponent of those LRUs can be rolling-element bearings,whose RUL is of great interest and are therefore in the centreof this enquiry.Rolling-element bearings ensure the functionality of rotatingassembly parts in case of varying loading and frequency. Dur-ing their life cycle, bearings degrade in two different ways

Dipl.-Ing. Robert Schrader et.al. This is an open-access article distributedunder the terms of the Creative Commons Attribution 3.0 United States Li-cense, which permits unrestricted use, distribution, and reproduction in anymedium, provided the original author and source are credited.

according to the amount, duration and the nature of loadingand other influences. Those are e.g. contaminants or con-structive incertitudes that can affect the tribological systemof a bearing. Overstraining and solid friction caused by ahigh cyclic stress and a lack of lubricant, respectively, leadsto a rapid degradation of the bearing, as in case of calculatedstrains like wear and tear or fatigue, the course of damage in-creases continuously. A degradation process of a real bearingresults from both kinds of strains and, therefore, has a strongstochastic character (Sturm, 1986).To simulate this behaviour, several approaches of degrada-tion models (DMs) have been developed in the past. Mostof them base on the Paris-Erdogan law that describes a rela-tion between crack growth rate and effective stresses in theexamined material. By adjusting this law to existing test re-sults of real bearings, several enhancements were formulatedand evaluated, as Choi et al. did in (Choi & Liu, 2006b)and (Choi & Liu, 2006a), respectively. Other DMs are basedon the Lundberg-Palmgren model that describes the correla-tion between the probability of survival and among others themaximum shearing stress. Yu et al. refined this approach byadding a more precise geometrical description of the contactsurface in (Yu & Harris, 2001).All aforementioned models describe the degradation to de-pend on the application of external load difference, as a non-loaded bearing would not degrade at all. In reality, the degra-dation is also a function of the current degradation, since de-tached particles can lead to solid friction. The DM in thispaper that is applied to generate reliable degradation coursesconsiders both the degradation rate due to loading and due tothe state of degradation itself.Most of these DMs are used as prognostic models (PMs) incombination with state estimation. Usually particle filtersbased on the aforementioned Paris-Erdogan model are imple-mented, as done in (Orchard & Vachtsevanos, 2009). Otherprognostic concepts are based on the Archard wear equation.Daigle et al. presented a model-based prognostic approach byestimating the RUL of a pneumatic valve with a fixed-lag par-

1


202


ticle filter (Daigle & Goebel, 2010). The appropriated modelrelates the current degradation to the wear of material basedon the Archard equation.Besides a DM that accounts for the current degradation state,Orsagh et al. presented a prognosis approach of a rolling-element bearing (Orsagh, Sheldon, & Klenke, 2003). Bymeasuring several features like e.g. the oil debris of the bear-ing or the vibration signal, they predicted the RUL dependingon the measured fused features by correlation with the currentstate of degradation. The RUL was then forecast according tothe applied PM.The prognostic concept at hand attempts another approach,as it is not based on a physical model. Therefore, a dynamicGaussian Process (GP) model is trained on a degradation pro-cess and combined with the Unscented Kalman Filter (UKF)for state estimation. Ko et al. analysed this dynamic prognos-tic model in (Ko, Klein, Fox, & Haehnel, 2007) by trackingan autonomous micro-blimp. Additionally, the expected ben-efits of the GP-UKF concept in combination with a multiplemodel approach is examined.This paper is divided into four parts. In Section 2 the ap-propriated DM of a rolling-element bearing is presented. Theprognostic approach with a short introduction in the two com-ponents UKF and GP and the multiple model approach aredescribed in Section 3 and in Section 4 the two concepts aretested and evaluated in the context of a case study.

2. BEARING FAULT DEGRADATION

As the objective of this paper is to forecast the degradationof a bearing, a feature has to be identified that directly corre-sponds to the current state of degradation. One variable is thesurface of pitting A, i.e. excavation of macroscopic particlescaused by material fatigue, either in rolling-elements or in theinner- or outer-race. As one pitting does not immediately leadto the failure of a bearing, its functionality remains. However,this fault produces single impacts due to the geometrical ir-regularity to the assembly group directly contacting the bear-ing. Depending on the location of the fault, these impactsappear with certain frequencies as a function of the rotationspeed Ω of the shaft, the number of rolling elements n and ge-ometric magnitudes, summarised by Antoni et al. in (Antoni,2007) and depicted in Table 1.

Inner-race fault n2 Ω(1 + d

D cos θ)Outer-race fault n

2 Ω(1− dD cos θ)

Rolling-element fault DΩd (1− ( dD cos θ)2)

Cage fault Ω2 (1− d

D cos θ)Inner-race modulation ΩCage modulation Ω

2 (1− dD cos θ)

Table 1. Ω = speed of shaft; d = bearing roller diameter;D = pitch circle diameter; n = number of rolling elements;θ = contact angle

These impacts produce a structure-borne noise and by usingan acoustic emission sensor, the frequency and the amplitudeof the impacts generated by the rotating fault can be detected.Herewith the location of the fault (according to the frequencyand Table 1) and the degree of degradation can be determinedas the acceleration amplitude is assumed to correlate withthe pitting surface and, therefore, the current condition of thebearing.The degradation of a real rolling-element bearing results fromtwo different courses of damage, as described in the introduc-tion: a continuously rising damage caused by material fatiguethat is interrupted by abrupt steps as a result of overstraining.Both effects can be mathematically described in the appropri-ated DM by the following differential equation of the degra-dation, i.e. pitting surface A

∆Ai = kA ·Ai−1 + ku ·∆ui−1, (1)

where ∆ui is the external loading difference of the bearingduring one cycle and kA ∼ W(λ′(Ai−1), k′(Ai−1)) is a fac-tor drawn from a Weibull distribution, whose scaling param-eter k′ and shape parameter λ′ are expected to be functionsof the previous degradation Ai−1. The product kA · A rep-resents the influence of the degradation on the transition rate.Analogously, ku ·∆u stands for the increased degradation ratecaused by loading, as ku ∼ E(µ(Ai−1)) is drawn from an ex-ponential distribution, whereas the mean µ is also a functionof Ai−1. Both coefficients kA and ku realise the stochasticcharacter of the degradation. Therefore, the current degrada-tion in cycle i can be calculated as

Ai = Ai−1 + ∆Ai. (2)

0 50 100 150 200 2500

20

40

60

cycle

pitti

ngsu

rfac

eA

/µm

2

(a)

0 50 100 150 200 2500

0.5

1

cycle

load

ing

u

(b)

Figure 1. (a) three different degradation courses as the resultof Equation (1), (b) applied normalised load spectrum

2


203


In Figure 1, three different damage courses generated by theDM and the applied normalised loading are depicted. Fig-ure 1a clearly shows the stochastic character of a real degra-dation, as the RUL of all courses differ strongly and themainly continuous course is interrupted by steps in case ofhigh strain. The correlation between the applied loading inFigure 1b and the degradation rate is obvious, as the load dif-ference between cycle 100 and 125 is zero and the degrada-tion in this range is quite flat. Thus, the applied DM is as-sumed to reproduce the damage course of a faulty bearing forthe use of this paper instead of real test rig measurements.

3. PROGNOSTIC APPROACH

The applied prognostic concepts are introduced in this sec-tion. The UKF is used for state estimation and prediction ofthe degradation. Instead of a parametric model, the UKF isfounded on a trained dynamic GP. The basics of both prog-nostic tools are presented in the following subsections. Sub-section 3.1 is based on (Ko et al., 2007), (Ko & Fox, 2011)and (Rasmussen & Williams, 2006). In Subsection 3.3, amultiple model approach that promises an increased prognos-tic performance is explained.

3.1. Dynamic Gaussian Process for Fault Degradation

The GP offers the feasibility of learning regression functionfrom sample data without any parametric model. Rasmussenet al. describe the GP in (Rasmussen & Williams, 2006) asdefining a Gaussian distribution over a function. In otherwords, the GP establishes a function f out of a given train-ing data set D = (x1, y1), (x2, y2), ...(xn, yn) according toa given noisy process

y = f(X) + ε, (3)

where X = [x1, x2, ..., xn] is a n × m input matrix with mthe number of inputs and n the length of the single input vec-tor xi. y is a n-dimensional vector of scalar outputs and εrepresents a noise term, which is drawn from a Gaussian dis-tribution N (0, σ2).A Gaussian distribution is basically defined by its mean µ andcovariance Σ. Therefore, the GP defines a zero-mean jointGaussian distribution over the given outputs y of the trainingdata D, as follows

p(y) = N (0; K(X,X) + σ2nI). (4)

The covariance of this joint distribution consists of the ker-nel matrix K(X,X) that represents the deviation of the in-puts among each other and the term σ2

nI for the Gaussiannoise caused by ε. The entries of K are the kernel functionsk(xi, xj), where the squared exponential

k(xi, xj) = σ2f exp(−1

2(xi − xj)W(xi − xj)T ). (5)

is a standard kernel function. Here, σ2f is the signal variance

and W is a diagonal matrix that contains the distance measure

of every input.To calculate the mean GPµ and the covariance GPΣ out of agiven test input x∗ and test output y∗ w.r.t. the training dataD, the following expression can be applied

GPµ(x∗, D) = kT∗ [K + σ2nI]−1y (6)

for the mean and

GPΣ(x∗, D) = k(x∗, x∗)− kT∗ [K + σ2nI]−1k∗ (7)

for the covariance. Here, the compact form K(X,X) = Kand k∗ the covariance function between the test input x∗ andthe training input vector X is used. Obviously, the meanprediction in Equation (6) is a linear combination of thetraining output y and the correlation between test and traininginput. The covariance is the difference of the covariancefunction w.r.t. the test inputs and the information from theobservation k(x∗, x∗).The GP possesses three so-called hyperparametersθ = [W σf σn] from the kernel function and the pro-cess noise. Optimal hyperparameters θ can be found bymaximising the log likelihood

θmax = arg maxθlog(p(y|X, θ)) . (8)

Considering a stochastic dynamic degradation process, Equa-tion (3) can be written as

rk+1 = rk + ∆rk + εk. (9)

Therefore, the state transition ∆rk is trained with the GP. Thegenerated training data set Dr = X,X ′ consists of the in-puts X = [(r1,∆u1), (r2,∆u2), ..., (rn,∆un)] and the statetransition X ′ = [∆r1,∆r2, ...,∆rn], which is calculated as

∆rk = rk − rk−1 (10)

or w.r.t. the mean of the dynamic GP of Equation (6)

rk = rk−1 +GPµ(uk−1, rk−1, Dr) (11)

and the covariance GPΣ(uk−1, rk−1, Dr), both fully de-scribe the Gaussian distribution of the GP. The additional ben-efit generated by this approach is the time invariance causedby the transition from a static to the dynamic system andthe ability to capture different kinds of degradation processeswithout physical knowledge of the actual process.

3.2. Combining GP and Unscented Kalman Filter

In case of a nonlinear dynamic system, the application of theUKF is the appropriate choice, because it estimates the stateof nonlinear systems by means of observation z and systeminputs u. As the presented prognostic approach intends toomit a physical degradation model, the Extended Kalman fil-ter is also inapplicable, since an analytic model is requireddue to the linearisation step.In general, a nonlinear dynamic system in kth time step canbe described as

xk = G(xk−1,uk−1) + εk (12)

3


204


with the state transition function G, the n-dimensionalstate vector x, the input vector u and an additive Gaussiannoise term ε drawn from a zero-mean Gaussian distributionε ∼ N (0, Qk) with the process noise Qk as covariance.An analogue description of the observation zk can be formu-lated as

zk = H(xk) + δk. (13)

Here, H relates the state to the observation and δ is also anadditive noise term δ ∼ N (0, Rk), where Rk is the measurenoise.Through the scaled unscented transformation by Julier et al.(Julier, 2002) sigma points χ[i] are defined according to thecovariance Σ and the mean µ of the previous time step

χ[0] = µχ[i] = µ+ (

√(n+ λ)Σ)i for i = 1, ..., n

χ[i] = µ− (√

(n+ λ)Σ)i−nfor i = n+ 1, ..., 2n,(14)

where λ is a scaling parameter that, in case of the scaled un-scented transformation, is defined as

λ = α′2(n+ κ)− n. (15)

Here, α′ and κ are further scaling parameters to determine thespread of the sigma points. These sigma points according tothe standard UKF are transformed depending on function Gto generate a new distribution with mean and covariance.As the applied UKF contains the dynamic GP, this state tran-sition functionG is replaced by the Gaussian predictive distri-bution of Equation (6) and thereby defines a new set of sigmapoints

χ[i]k = GPµ(χ[i], D). (16)

Similarly, the process noise QK is defined by Equation (7).With this information, a priori mean and covariance can begenerated by

µ =

2n∑

i=0

w[i]m χ

[i]k

Σ =

2n∑

i=0

w[i]c (χ

[i]k − µ′)(χ

[i]k − µ′)T+

+ GPΣ(x∗, D) (17)

with weights wm and wc set up in (Julier, 2002).The whole applied GP-UKF algorithm is depicted in Table 2.In comparison to Equation (16), the new sigma points in line3 are generated by χ[i]

k = χk−1 +GPµ(uk−1, χ[i]k−1, DG), as

the applied GP is trained according to Equation (9). The pre-diction of the mean and covariance in time step k describedin Equation (14) to (17) takes places from line 1 to 5. A prioriestimation is corrected according to the measured observationzk from line 7 to 13. This correction step proceeds similarlyto the prediction.In line 6 the transformed sigma points of line 3 χk are usedas observation Z [i]

k . Line 8 is comparable to line 5 and in line

1: Inputs µk−1, Σk−1, uk−1, zk, Rk2: χk−1 = (µk−1, µk−1 + γ

√Σk−1, µk−1 − γ

√Σk−1)

3: for i = 0, ..., 2n :

χ[i]k = χk−1 +GPµ(uk−1, χ

[i]k−1, Dg)

Qk = GPΣ(uk−1, µk−1, Dg)

4: µk =∑2ni=0 w

[i]m χ

[i]k

5: Σk =∑2ni=0 w

[i]c (χ

[i]k − µk−1)(χ

[i]k − µk)T +Qk

6: Z [i]k = χ

[i]k

7: zk =∑2ni=0 w

[i]m Z [i]

k

8: Sk =∑2ni=0 w

[i]c (Z [i]

k − zk)(Z [i]k − zk)T +Rk

9: Σx,zk =∑2ni=0 w

[i]c (χ

[i]k − µk)(Z [i]

k − zk)T

10: Kk = Σx,zk S−1k

11: µk = µk +Kk(zk − zk)

12: Σk = Σk −KkSkKTk

13: Outputs µk,Σk

Table 2. Applied GP-UKF Algorithm

9 the cross-covariance of prediction and observation is deter-mined. Depending on both, the Kalman gain Kk is generatedin line 10 and based on this, the new mean and covariance intime step k are defined in line 11 and 12, respectively.

3.3. Multiple Model Approach

Selecting one model to predict the RUL of bearing faults ig-nores the uncertainty due to the stochastic nature of the degra-dation process. To take the uncertainties into account, moreprognostic models (PMs) are needed to improve the predic-tion. A Bayesian formalism is used to combine the knowl-edge of a setM of PMs, by weighting each model to be thecorrect one, as demanded by Li et al. in (Li & Jilkov, 2003).Therefore, the Interacting Multiple Model (IMM) estimator,which bases on the Autonomous Multiple Model (AMM), isapplied. In contrary to latter, the IMM belongs to the groupof cooperating multiple model approaches, since every modelmi ∈ M interacts with the other. Thus, the multiple modelfilters are reinitialised during every time step k according toinformation of the previous time step.Consider Equations (12) and (13) with one PM. Then the ex-tension to the multiple model approach follows as

xk = G(xk−1,uk−1,mi) + εk

zk = H(xk,mi) + δk (18)

according to (Schaab, 2011).The first steps of the IMM algorithm consist of a reinitialis-ing step with a calculation of a mode probability of every ithmodel

µ(i)k|k−1 = P (m

(i)k |y1:k) for i = 1, ..., nz

=

nz∑

j=1

hijµ(j)k−1 (19)

with the entries hij = Pmk = mj |mk−1 = mi

of the

transition matrix H according to Markov. The application of

4


205


the transition matrix H prevents the prognostic approach ofinsisting on one model, as it offers the possibility of a changefrom model i to j during every time step. Therefore, the tran-sition matrix H describes a Markov chain, whereas H is as-sumed to be time invariant.By using the information of the previous time step andµ

(i)k|k−1, a weighting factor according to

µj|ik−1 = P (m

(i)k−1|y1:k−1,m

(i)k )

=hj|iµ

(j)k−1

µ(i)k|k−1

(20)

is calculated. Herewith an individual reinitialising value forevery filter

x(i)k−1|k−1 = E[xk−1|y1:k−1,m

(i)k ]

=

nz∑

j=1

x(j)k−1|k−1µ

j|ik−1 (21)

and similarly a covariance P (i)k−1|k−1 (s. Table 3) is computed.

After the reinitialising of the models, these initial values areprovided to the applied filters, which are in case of the appro-priated prognostic approach GP-UKFs with different PMs.According to the likelihood L

(i)k , which depends on the

residuum e(i)k = z

(i)k − zk

(i) and indicates the probabilitythat i is the correct model, the state probability of model i iscalculated as

µ(i)k =

µ(i)k|k−1L

(i)k

∑nz

j=1 µ(j)k|k−1L

(j)k

. (22)

Finally, the results of the single i filters are fused to the statexk|k and covariance estimation Pk|k by means of the mini-mum mean squared error (MMSE) weighted with the stateprobability of Equation (22)

xk|k =

nz∑

i=1

x(i)k|kµ

ik

Pk|k =

nz∑

i=1

[P(i)k|k + (xk|k − x(i)

k|k)(xk|k − x(i)k|k)T ]µ

(i)k .(23)

The entire algorithm is depicted in Table 3.

4. CASE STUDY

The previously defined prognostic concepts are tested andevaluated for the case of a degrading rolling-element bear-ing according to the DM of Section 2. Since the training dataof the GP are computer-generated, the appropriated vibrationmodel (VM) is defined and described. Afterwards, the resultsand problems of the GP-UKF approach and the IMM prog-nostic approach are presented.

1: Inputs µ(i)k−1, xk−1|k−1, Pk−1|k−1

2: µ(i)

k|k−1 =∑nzj=1 hijµ

(j)k−1, for i = 1, ..., nz

3: µj|ik−1 =

hj|iµ(j)k−1

µ(i)k|k−1

4: x(i)

k−1|k−1 =∑nzj=1 x

(j)

k−1|k−1µj|ik−1

5: P(i)

k−1|k−1 =∑nzj=1[P

(j)

k−1|k−1 + (x(i)

k−1|k−1 − x(j)

k−1|k−1)

(x(i)

k−1|k−1 − x(j)

k−1|k−1)T ]µj|ik−1

6: s. Table 2, Inputs: x(i)

k−1|k−1, P(i)

k−1|k−1, yk, R(i)

Outputs: e(i)k = z

(i)k − zk(i), x

(i)

k|k, P(i)

k|k7: L

(i)k = P (e

(i)k |m

(i)k , y1:k) = N (e

(i)k ; 0, S

(i)k )

8: µ(i)k =

µ(i)k|k−1

L(i)k

∑nzj=1 µ

(j)k|k−1

L(j)k

9: xk|k =∑nzi=1 x

(i)

k|kµik

10: Pk|k =∑nzi=1[P

(i)

k|k + (xk|k − x(i)

k|k)(xk|k − x(i)

k|k)T ]µ(i)k

11: Outputs µ(i)k , xk|k, Pk|k

Table 3. IMM Algorithm

4.1. Simulation of Structure-borne Noise

The aim of this subsection is to generate a vibration signal ofa faulty bearing, as it could be measured in reality. There-fore, a combination of a later on set up VM and a DM isrequired, since the latter creates a monotonically rising ac-celeration amplitude of impulses, as described in Section 2,which are modulated by the VM. By this means, a vibrationsignal is generated and can be evaluated in frequency rangeto detect the state of degradation.The impulses in the area of the bearing can be measured byan acoustic emission sensor. Antoni et al. determined in(Antoni, 2007) that this measured vibration signal consists ofseveral modulations of the initial impulses and can be sum-marised in a VM in time domain as

x(t) =

+∞∑

i=−∞h(t− iT − τi) q(iT )Ai + n(t). (24)

Here, τi and Ai represent the uncertainties of the measuredsignal in the arrival time and the amplitude of the ith impact,respectively, as e.g. the penetration of the rolling-element intothe pitting of an inner-race is a stochastic process. The trans-mission behaviour of the surrounding machine parts up tothe acoustic emission sensor is considered by the impulse re-sponse function h(t) with the inter-arrival time iT of two con-secutive impulses. The amplitude modulation q(t) is causedby the cage frequency and n(t) represents the additive back-ground noise.The applied VM is based on Equation (24). For a more re-alistic signal, it is assumed that there is another additive im-pulse sequence caused by other mechanisms in the LRU. Forexample a rotating systems with a rotating-frequency is alsomeasured by the acoustic emission sensor and its amplitudesoutclass those of the fault. The resulting vibration signal x(t)in time range and its power spectral density (PSD) is depicted

5


206


0 4 · 10−2 8 · 10−2 0.14

−1000

100

time t/s

x/10−

2m/s

2

(a)

102 102.5 103 103.50

5

10

frequency f/Hz

PSD

ofx

(b)

102 102.5 103 103.50

5

10

frequency f/Hz

PSD

ofxenv

(c)

Figure 2. (a) vibration signal x(t) in time range, (b) PSD ofx(t) with the marked fault frequency ff = 127Hz, (c) PSDof the envelope of x(t)

in Figures 2a and 2b, respectively. In Figure 2b, there is amark at the fault frequency ff = 127Hz, as a fault was as-sumed to be localised at the inner-race.The vibration signal in time range is dominated by the back-ground noise and the impact sequence of other mechanismswith a frequency of fo = 20Hz. In Figure 2b the as-sumed system behaviour of a second-order lag element withan eigenfrequency of fSB ≈ 1600Hz representing the pathbetween the bearing and the sensor is clearly visible in con-trast to the impulses caused by the fault, which are almostovershadowed by the background noise.By the application of an envelope xenv of the original vibra-tion signal, the influence of the system behaviour is reduced,as depicted in Figure 2c. Besides the impulse sequence and itsmodes, the amplitude of the fault frequency can be scanned.The amplitude of the PSD at the fault frequency is relatedto the amplitude given by the DM and therefore is an appro-priate feature for the prognostic process, as it determines thecurrent state of degradation. In addition the sidebands causedby the cage modulation q(t) at the frequency f = ff ± fcwith an expected cage frequency fc = 20Hz get visible.

0 50 100 150 2000

20

40

60

cycle

pitti

ngsu

rfac

eA

/µm

2

(a)

0 50 100 150 2000

10

20

cycle

feat

ure

(b)

Figure 3. simulated degradation of a rolling-element bearing(a) real degradation as the result of Equation (1), (b) measurednormed degradation of sampling the PSD generated by thepreviously set up VM at the fault frequency

The comparison of the real degradation of the applied DMin Equation (1) and the feature rfeat is depicted in Figure 3,whereas the applied loading is given in Figure 1b. The mea-sured degradation is quite noisy due to the background noisethat dominates the PSD of the vibration signal in lower fre-quency range. It indicates a different course compared tothe real degradation due to the frequency analysis, but as italso denotes the monotonically rising character, the measuredamplitude directly correlates with the pitting surface in Fig-ure 3a, i.e. the current degradation.The prognostic range is set within the normed feature bound-ary rfeat ∈ [0.001, 1], which is related to a pitting surfacerange of A ∈ [5µm2, 70µm2]. So this measured degradationcourse is applied for training and testing the GP-UKF prog-nostic concepts.

4.2. Applied Performance Metrics

To analyse the prognostic performance, several performancemetrics have to be applied. Those metrics can be one singleanalytical characteristic for the entire prediction or a graphi-cal depiction of every prediction step. Selected performancemetrics are summarised by Saxena et al. in (Saxena et al.,2008), whereas a few of these are used for evaluation of theappropriated prognostic concept in the following sections.Some notations of the metrics domain are given in the follow-ing glossary:

UUT Unit under testEOL End of life

6


207


EOP End of prediction - predicted failure featurecrossed threshold

i Time indexl Number of UUT indexP Time index of the first predictionL Total number of predictionsλ Normed time range of the entire predictionrl(i) Estimation of RUL at time step ti for the lth

UUTrl∗(i) Real RUL at time step ti

In the following subsections the applied performance metricsare defined.

4.2.1 Error

The error ∆l(i) indicates the difference between the predictedRUL and the true RUL in time step i

∆l(i) = rl∗(i)− rl(i) (25)

The error is one of the basic accuracy indicators and is, there-fore, included directly or indirectly in most of the selectedmetrics.

4.2.2 Average Bias

By averaging the error w.r.t. the entire prediction range, theaverage bias AB of lth UUT is defined as

ABl =

∑EOPi=P ∆l(i)

EOPl − Pl + 1(26)

Thus, the perfect score of ABl is zero.

4.2.3 Mean absolute percentage error

The Mean Absolute Percentage Error (MAPE) can be writtenas

MAPE(i) =1

L

L∑

i=1

∣∣∣∣100∆l(i)

rl∗(i)

∣∣∣∣ . (27)

As it contains the error w.r.t. the actual RUL, derivations inearly states of prediction are not as weighted as those near theEOP.

4.2.4 Mean squared error

One most commonly used metric is the Mean Squared ErrorMSE, since it averages the squared error w.r.t. the numberof predictions L

MSE(i) =1

L

L∑

i=1

∆l(i)2 (28)

An advantage in comparison to the average bias is that theMSE considers both negative and positive errors, as theaverage bias decreases at the appearance of positive andnegative derivations within one prediction.

4.2.5 Prognostic horizon

The Prognostic horizon PH describes the difference betweenthe EOP and the current time step i

PH(i) = EOP − i, (29)

whereas the PH can be dictated to fulfill certain specifica-tions. Those are e.g. to remain within a given constant errorbound depending on an accuracy value α, i.e.

[1− α] · rl∗ ≤ rl(t) ≤ [1 + α] · rl∗, (30)

comparable to the metrics in the following last subsection.Throughout the whole expectations the accuracy value is α =0.05.

4.2.6 α - λ Performance

Similarly to the PH, the α - λ Performance describes the timespan, when the predicted RUL remains within a given errorbound. In comparison to the PH, the bound decreases linearlyaccording to

[1− α] · rl∗(t) ≤ rl(t) ≤ [1 + α] · rl∗(t). (31)

Like the MAPE of Equation (27), this metric favours earlypredictions at λ ≈ 0 and tightens the demands for predictionsnear EOP (λ ≈ 1). Throughout the whole expectations theaccuracy value α of the α - λ accuracy is α = 0.20.

4.3. Prognostic Results of GP-UKF Approach

The aim of this section is to evaluate the GP-UKF approach.In Figure 4, four different test trials are depicted, whereas allcourses were generated by the DM in Equation (1). How-ever, the difference especially between trial 1 and 4 in termsof RUL at the beginning of the observation is obvious. There-fore, trial 2 has a similar course to trial 3 with nearly the samelife cycle.In Figure 4b, the corresponding features scanned at the faultfrequency ff = 127Hz in the PSD are given, which show aslightly different character in comparison to the real degrada-tion. The noise increases at higher degradation and is filteredby a low-pass filter with regard to the GP training. In Figure 5,the results of the GP training using trial 3 is shown, whereasthe estimated training and the real degradation is overlaid bythe estimated test degradation.The results of both the state estimation and the prediction ofdata set 2 and 3 using the UKF are presented in Figure 6.To compare the state estimation performance, the real featurecourse is also plotted. The first prediction started at cycle 2,whereas the following predictions began every 10 cycles later.The state estimation of data set 2 matches the real course rec-ognizably. But also the predictions show a high accuracy withan initial error ∆(2)(i = 2) ≈ 2 cycles. In sum, all predic-tions represent the behaviour of the damage course of trial 2.When the GP-UKF is trained and tested with data set 3, theprediction performance becomes slightly worse, as the error

7


208


0 100 200 3000

20

40

60

cycle

pitti

ngsu

rfac

eA

/µm

2

trial 1 trial 2 trial 3 trial 4

(a)

0 100 200 3000

10

20

cycle

feat

ure

trial 1 trial 2 trial 3 trial 4

(b)

Figure 4. four applied data sets, (a) real degradation, (b) fea-ture

of the forecast RUL of early predictions is about 20 cycleslower than the real RUL. However, later predictions (≈ 6th)match the real degradation with an accuracy allowing the pre-diction of the RUL.The same aforementioned behaviour is depicted in severalperformance metrics, summarised in Table 4. The two simi-lar metrics PH and α-λ accuracy are given as fractions of thenormed prediction range λ to indicate the time range, whenthe predictions fulfill the specifications until the EOP. Addi-tionally, the RUL is also normed to allow comparison of thefour trials with different RUL.In general, the predictions of all four trials show a high ac-

0 5 10 15 200

0.1

0.2

0.3

degradation r

deriv

atio

n∆r

real r est r tr est r test

Figure 5. training of the GP with data set 3

0 50 100 150 200 2500

0.5

1

cycle

feat

ure

state estimation real feature predictions

(a)

0 50 100 150 2000

0.5

1

cycle

feat

ure


(b)

Figure 6. the state estimation and prediction of (a) data set 2and (b) data set 3 in comparison to the real degradation course

Performance AB MSE MAPE α - λ PHMetric

Data Set 1 −0.21 21.79 19.52 1 0.86Data Set 2 0 10.87 13.11 1 1Data Set 3 −3.48 78.71 24.55 1 0.81Data Set 4 −2.28 56.44 23.86 1 0.84

Table 4. performance metrics of the four trials tested withthemselves

curacy, since every trial remains within the given α-λ errorbound during the entire prediction range and also satisfies thetighter specification of the PH after 20% of the normed pre-diction range λ, as displayed in Figure 7. Therefore, everytrial converges to the actual RUL with only slight deviationsat 0.3 λ. Additionally, all predictions indicate a rather conser-vative character, since the AB of all trials is mainly negative.In sum, the selected metrics correspond with the graphicalresults of Figure 6. The appropriated GP-UKF prognosticconcept offers a high accuracy for long turn prediction of arolling-element bearing, in case of the degradation followingthe model the filter is trained with.

4.4. Generalisation of the Prognostic Approach

Now the prognostic results of a GP-UKF that is tested witha degradation course, which differs from the training set, arediscussed as it occurs in real applications. Figure 8a and 8b

8


209


0 0.2 0.4 0.6 0.8 10

0.5

1

λ

RU

L

accuracy span pred. RUL Tr/Test 1pred. RUL Tr/Test 2 pred. RUL Tr/Test 3pred. RUL Tr/Test 4 real RUL

(a)

0 0.2 0.4 0.6 0.8 10

0.5

1

λ

RU

L

accuracy span pred. RUL Tr/Test 1pred. RUL Tr/Test 2 pred. RUL Tr/Test 3pred. RUL Tr/Test 4 real RUL

(b)

Figure 7. (a) prognostic horizon of the four given trials atα = 5%, (b) α-λ accuracy at α = 20%

show the results of the state estimation and prediction of trial1 and 4, respectively, when the prognostic model is trainedwith the data of trial 2. Additionally, the real degradationis plotted. The state estimation of both sets is satisfactory,since there are only slight deviations over the whole prognos-tic range. The predictions generally indicate the course ofthe training data set 2 with a progressive degradation at thebeginning and a flat degradation rate at the end of the life cy-cle, whereas both characteristics differ from the tested sets.Therefore, the forecast degradations are not as convincing asin comparison to Section 4.3, according to expectations.


Data Tr 2 Test 1 44.64 3522.93 242.65 0 0.07Data Tr 2 Test 4 8.64 544.8 116.60 0 0.04Data Tr 3 Test 1 22.79 1514.79 151.34 0 0.071Data Tr 3 Test 4 −28.12 1172.28 149.59 0 0.28

Table 5. performance metrics of different test data sets, whenGP is trained with data set 2

The performance is analysed again by means of the metricsin Table 5. Additionally, the prognosis performance in caseof training data set 3 is depicted. Compared to the results of

0 50 100 150 2000

0.5

1

cycle

feat

ure


(a)

0 100 200 3000

0.5

1

cycle

feat

ure


(b)

Figure 8. the state estimation and prediction of (a) data set 1and (b) data set 4 with the GP trained with data set 2

the previous section, all metrics increased considerably dueto both aforementioned causes of different degradation be-haviour of test and training sets and slightly inaccurate stateestimation. Hereby the predictions from cycle 60 to 90 (cor-responds to λ ≈ 0.5) of test data set 1 and at the end of dataset 4 are not able to predict the real damage course. These de-viations are also depicted in the graphical metrics in Figure 9,whereas neither the specifications of α-λ accuracy nor of thePH are fulfilled satisfactory.In comparison to training data set 3, the predictions with a dy-namic GP trained with data set 2 show beneficial prognosticresults in case of test data set 4 according to Table 5, whereasw.r.t. test data set 1 the third data set is advantageous. There-fore, a combination of both training data sets through a mul-tiple model approach is assumed to exhibit benefits in com-parison to the GP-UKF with only one set of training data.

4.5. Improvements by means of Multiple Model Approach

The prediction performance of an IMM approach with twodifferent GP-UKFs is discussed in this section. The two mod-els are the GP-UKF trained with trial 2 and 3, respectively, asthe combination of both is supposed to indicate the benefitsof the IMM approach.In Figure 10, the state estimation and prediction results oftest set 1 and 4 are depicted. The real degradation of bothtest cases is estimated very accurately with only a slight devi-

9


210


0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

λ

RU

L

accuracy span pred. RUL Tr/Test 2/1pred. RUL Tr/Test 2/4 pred. RUL Tr/Test 3/1pred. RUL Tr/Test 3/4 real RUL

(a)

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

λ

RU

L

accuracy span pred. RUL Tr/Test 2/1pred. RUL Tr/Test 2/4 pred. RUL Tr/Test 3/1pred. RUL Tr/Test 3/4 real RUL

(b)

Figure 9. (a) prognostic horizon of the two tested data sets 3and 4 with training data set 2 (α = 5%), (b) α-λ accuracy atα = 20%

ation in Figure 10a between cycles 50 and 60. At first sight,the prediction results show a similar performance comparableto the GP-UKFs of Section 4.4. Especially the first predic-tions determine the RUL rather inaccurately, since the error|∆(i = 2)| amounts in both cases about 60 cycles. As de-scribed in the previous section, both test sets differ from theapplied training sets, which causes poor reproduction of thedamage course. The mode probability of trial 1 in Figure 10bindicates the same reason, since especially at the beginningof the prediction neither of both training sets replicate the realdegradation to satisfaction and therefore the model probabil-ities µ(1)

k and µ(2)k are about 0.5. In comparison to Figure 8a,

the later predictions of the MM approach in Figure 10a indi-cate a more accurate RUL estimation due to the dominationof training set 3 that reproduces the test set 1 more precisely.In Table 6, the performance metrics of the IMM approach areshown. Due to the inaccurate forecasts at the beginning of theprognosis, the metrics values are comparable to Table 5.To identify the advantages of the IMM approach, the net dia-grams in Figure 12 give an overview of the collected results.They show the metrics normalised to the major value withina test trial. Since most metrics describe an inaccurate RUL

0 50 100 150 200 2500

0.5

1

1.5

cycle

feat

ure


(a)

0 20 40 60 80 100 120 140

0

0.5

1

cycle

mod

epr

obab

ility

Tr 2 Tr 3

(b)

Figure 10. prediction results of training sets 2 and 3 testedwith (a) trial 1 and (b) mode probability of training sets 2 and3 during prediction of trial 1


Data Tr 2,3 Test 1 25 1841.14 160.54 0 0.07Data Tr 2,3 Test 4 −7 895.32 132.99 0.04 0.12

Table 6. performance metrics of IMM with training set 2 and3 testing set 1 and 4

estimation with large values, the time range, when the pre-dictions fulfill the specifications of the α-λ and the PH errorbound, is exchanged for the time range those specificationsare not met, i.e. PHnet = 1− PH .The diagrams show the advantage of the IMM approach, as allmeasured performance metrics lie between the metrics of theGP-UKFs trained with one data set. That means this approachincreases the robustness of a bearing’s RUL prediction, as itprovides the possibility of discarding inaccurate models de-pending on the mode probability.However, since the MM approach consists of the two PMs,which both differ from the test sets, the prognostic perfor-mance is still hardly able to outperform the results of bothGP-UKFs with reference to Table 5 and Table 6. Especiallythe α - λ accuracy and PH of trial 1 (Figure 12a) of bothGP-UKFs indicate the same behaviour and, thus, an IMMapproach based on those models is not able to increase thisperformance.The great benefit of the increased robustness is assumed to

10


211


0 50 100 150 200 250 3000

0.5

1

1.5

cycle

feat

ure


Figure 11. prediction results of training trials 2 and 3 testedwith trial 4

raise by including more models with different degradationcourses. Especially a progressive degradation rate at the be-ginning of the prediction range in case of forecasting test set1 is expected to be beneficial concerning the prognosis per-formance.

(a)

(b)

Figure 12. comparison of the single performance metrics of(a) test data 1 and (b) test data 4

5. CONCLUSION

Two prognostic concepts based on the GP-UKF approach topredict the RUL of a rolling-element bearing were examinedin the context of a case study. The results showed that a dy-namic GP in combination with an UKF estimates the RUL

of a bearing very accurately, when the applied training data isequal to the trial data. If the training data differs from the trialdata, the GP-UKF is not able to forecast the degradation pre-cisely, but mainly insists on the characteristics of the traineddamage course. To solve this problem, an IMM approachbased on two different GP-UKF models has been evaluated. Itwas assumed that the IMM algorithm, restoring several prog-nostic models, is more likely to forecast a damage course ofan unknown trial. The results proved these expectations, sincethe robustness of the predictions was highly increased by theapproach.By incorporating more prognostic models into the IMM ap-proach, which should mainly differ from the applied GP-UKFs, this approach is expected to even outperform the prog-nostic results of a single GP-UKF. This will be in the focusof further research.

NOMENCLATURE

Symbols

A Pitting surface∆Ai Increase of surface during cycle i∆ui Loading difference during cycle ikA, ku Coefficients of applied degradation modelλ′, k′ Shape/scaling parameter of Weibull

distributionµ′ Expected value of Exponential distributionD Data setxn Inputs of GPX Input matrixyn Outputs of GPy Output matrixε Noise termµ Mean of modelΣ Covariance of modelK Kernel matrixσn Noise termk(xi, xj) Kernel functionσf Signal variance of Kernel functionW Distance measure weighting matrixGPµ Mean of GPGPΣ Covariance of GPx∗ Test inputθ Hyperparameters of GPrk Degradation at kth time step∆rk Degradation rate at kth time stepX ′ Degradation rate matrixDr Training data setG Transition functionQk Process noiseH Observation functionzk Observation at kth time stepRk Measure noise

11


212


δk Noise Termχ[i] Sigma pointsα′, κ Scaling parameters of UKFµ, Σ A priori mean/covariancewm, wc Weights of mean/covarianceZ [i]k Observation

Kk Kalman gainM Prognostic model setmi Prognostic model iµ

(i)k|k−1 Mode probabilityhij Entries of transition matrix Hx

(i)k−1|k−1 Reinitialising state of IMM

P(i)k−1|k−1 Reinitialising covariance of IMM

L(i)k Likelihood of model i

e(i)k Residuumµ

(i)k State probability of model ix(t) Total vibration signalh(t) Impulse responseτi, Ai Uncertainties of arriving impulse responseq(t) Amplitude modulationn(t) Background noiseff Fault frequencyfo Frequency of other mechanismsfSB Eigenfrequency of system behaviourxenv Envelope of x(t)fc Cage frequencyrfeat Degradation of feature∆l(i) Error of RUL predictionABl Average Bias of lth UUTMAPE(i) Mean absolute percentage errorMSE(i) Mean squared errorPH(i) Prognostic horizonα Accuracy value

Shortcuts

RUL Remaining Useful LifetimeLRU Line-replaceable UnitDM Degradation ModelPM Prognostic ModelGP Gaussian ProcessUKF Unscented Kalman FilterIMM Interacting Multiple ModelVM Vibration ModelPSD Power Spectral DensityTr Training

(See also Glossary in Section 4.2)

REFERENCES

Antoni, J. (2007). Cyclic spectral analysis of rolling-elementbearing signals: Facts and fictions. Journal of Soundand vibration, 304(3-5), 497–529.

Choi, Y., & Liu, C. R. (2006a). Rolling contact fatigue lifeof finish hard machined surfaces - Part 1. Model devel-opment. Wear, 261(5-6), 485–491.

Choi, Y., & Liu, C. R. (2006b). Rolling contact fatigue lifeof finish hard machined surfaces - Part 2. Experimentalverification. Wear, 261(5-6), 492–499.

Daigle, M., & Goebel, K. (2010). Model-Based Prognosticsunder Limited Sensing.

Julier, S. (2002). The scaled unscented transformation. InAmerican Control Conference, 2002. Proceedings ofthe 2002 (Vol. 6, pp. 4555–4559).

Ko, J., & Fox, D. (2011). Learning GP-BayesFilters viaGaussian process latent variable models. AutonomousRobots, 30(1), 3–23.

Ko, J., Klein, D., Fox, D., & Haehnel, D. (2007). GP-UKF: Unscented Kalman filters with Gaussian pro-cess prediction and observation models. In IntelligentRobots and Systems, 2007. IROS 2007. IEEE/RSJ In-ternational Conference on (pp. 1901–1907).

Li, X., & Jilkov, V. (2003). A survey of maneuvering targettracking—Part V: Multiple-model methods. In Proc.SPIE Conf. on Signal and Data Processing of SmallTargets (pp. 559–581).

Orchard, M. E., & Vachtsevanos, G. J. (2009). A particle-filtering approach for on-line fault diagnosis and failureprognosis. Transactions of the Institute of Measure-ment and Control, 31(3-4), 221–246.

Orsagh, R., Sheldon, J., & Klenke, C. (2003). Prognos-tics/diagnostics for gas turbine engine bearings. In Pro-ceedings of IEEE Aerospace Conference.

Rasmussen, C. E., & Williams, C. K. I. (2006). GaussianProcesses for Machine Learning.

Saxena, A., Celaya, J., Balaban, E., Goebel, K., Saha, B.,Saha, S., et al. (2008). Metrics for evaluating perfor-mance of prognostic techniques. In Prognostics andHealth Management, 2008. PHM 2008. InternationalConference on (pp. 1–17).

Schaab, J. (2011). Trusted health assessment of dynamicsystems based on hybrid joint estimation (Als Ms. gedr.ed.). Dusseldorf: VDI-Verl.

Sturm, A. (1986). Walzlagerdiagnose an Maschinen undAnlagen. Koln: TUV Rheinland.

Yu, W. K., & Harris, T. A. (2001). A New Stress-Based Fa-tigue Life Model for Ball Bearings. Tribology Trans-actions, 44(1), 11–18.

12


213

Using structural decomposition methods to design gray-box modelsfor fault diagnosis of complex industrial systems: a beet sugar

factory case studyBelarmino Pulido1, Jesus Maria Zamarreno2, Alejandro Merino3, and Anibal Bregon 4

1,4 Dept. de Informatica, University of Valladolid, Valladolid, [email protected], [email protected]

2 Depto. Ingenierıa de Sistemas y Automatica, University of Valladolid, Valladolid, [email protected]

3 Depto. Ingenierıa Electromecanica, University of Burgos, Burgos, [email protected]

ABSTRACT

Reliable and timely fault detection and isolation are neces-sary tasks to guarantee continuous performance in complexindustrial systems, avoiding failure propagation in the systemand helping to minimize downtime. Model-based diagnosisfulfils those requirements, and has the additional advantageof using reusable models. However, reusing existing complexnon-linear models for diagnosis in large industrial systems isnot straightforward. Most of the times the models have beencreated for other purposes different from diagnosis, and manytimes the required analytical redundancy is small. In thiswork we propose to use Possible Conflicts, which is a modeldecomposition technique, to provide the structure (equations,inputs, outputs, and state variables) of minimal models ableto perform fault detection and isolation. Such structural in-formation can be used to design a gray box model by meansof state space neural networks. We demonstrate the feasibil-ity of the approach in an evaporator for a beet sugar factoryusing real data.

1. INTRODUCTION

Prognostics and Health Management are very important tasksfor continuous operation and to comply with safety require-ments of large industrial systems. In such systems, moni-toring and early fault diagnostics are also fundamental tasksto avoid fault effects propagation, to prevent failures, and tominimize downtime. Hence, reliable and fast fault detectionand isolation are needed, providing additionally an accurate

Belarmino Pulido et.al. This is an open-access article distributed under theterms of the Creative Commons Attribution 3.0 United States License, whichpermits unrestricted use, distribution, and reproduction in any medium, pro-vided the original author and source are credited.

input for the prognostic stage.

Model-based reasoning provides different kinds of methodsto fulfil those requirements. Model-based diagnosis uses amodel of the system to estimate the proper behaviour and tocompare with current observations in order to detect anoma-lies. In the last three decades model-based diagnosis has beenapproached by two different communities DX (Hamscher,Console, & Kleer (Eds.), 1992) –using Artificial Intelli-gence techniques–, and FDI (Gertler, 1998; Blanke, Kinnaert,Lunze, & Staroswiecki, 2006; Patton, Frank, & Clark, 2000)–based on Systems Theory and Control. Both communitiesprovide different but complementary techniques, as demon-strated by recent works (Cordier, Dague, Levy, Montmain, &Trave-Massuyes, 2004).

Our proposal elaborates on the similarities of both approachesand focus on consistency-based diagnosis using numeri-cal models (Pulido, Alonso-Gonzalez, & Acebes, 2001).Consistency-based diagnosis proceeds in three stages: first,fault detection is performed by detecting minimal conflictsin the system (minimal set of equations or components in-volved in predicting a discrepancy); second, fault isolation isachieved by computing the minimal hitting-sets of the con-flicts; third, fault identification requires using fault models topredict the faulty behaviour (Reiter, 1987; Dressler & Struss,1996), and rejecting those fault modes that are not consis-tent with current observations. In this work, we use Possi-ble Conflicts (Pulido & Alonso-Gonzalez, 2004), PCs forshort, that are computed off-line and are the complete set ofminimal redundant models that can become conflicts. PCsprovide the structural model– equations, input, output, andstate variables– that can be used for fault detection and iso-lation, or can be also used to simplify the fault prognostics

1


214


stage (Daigle, Bregon, & Roychoudhury, 2011).

While using PCs we need to build off-line simulation or state-observer models (Pulido, Bregon, & Alonso-Gonzalez, 2010)to track the subsystem behaviour. This step requires theanalysis of the model, and sometimes to rewrite the originalequations for diagnosis purposes. Main advantage of model-based diagnosis is reusing existing models, but this is alsoits main difficulty. Frequently the models were created forpurposes different from diagnosis, and the required analyticalredundancy in the system is small, due to the price of addi-tional sensors, and because they allocation is related to pro-cess control. Both problems exist in large industrial systemswhere complexity comes from the highly non-linear mod-els required to mimic system performance. Consequently,reusing existing non-linear models for diagnosis in those sys-tems is not straightforward. We propose to use the structuralinformation in each Possible Conflict to design different kindof executable models. In this work, where precise analyticalmodels1 is difficult to handle, we propose to build grey-boxmodels based on a state space neural network architecture de-rived from that structural information.

Preliminary results in an evaporation unit for a beet sugar fac-tory in Spain using real data show the feasibility of the ap-proach. The system has slow dynamics and due to the highcosts of the start-up mode, it should work for weeks uninter-rupted. Main difficulty in the existing models comes from thenumber of unknown parameters to be identified in the model,and the presence of non-linearities that requires expert manip-ulation in order to derive diagnosis-oriented models. An addi-tional problem to test any approach is that there is few infor-mation about faults actually happening. Hence, any feedbackfrom the model-based diagnosis system will be very helpfulfor the system operators.

The organization of this paper is as follows. First, we presentthe real system to be studied. Second, we introduce the Pos-sible Conflicts technique used to find minimal models. Third,we introduce the state space neural network approach to ob-tain grey box models for the Possible Conflicts. Next, we testthe first principle and the neural network models in the casestudy, drawing some conclusions.

2. DESCRIPTION OF THE CASE STUDY: AN EVAPORA-TION UNIT IN A BEET SUGAR FACTORY

We will test our proposal in an evaporation station of a beetsugar factory. In such processes there are four main stages:diffusion, purification, evaporation and crystallization. Evap-oration is the stage in which the water contained in a juicewith low sugar concentration is evaporated in order to obtainhigher sugar concentration. Afterwards, the resulting syrupis used to obtain sugar crystals in a set of vacuum pans. Fig-

1Based on Physics first principles, usually a collection of ODEs.

ure 1 shows the main elements in an evaporation plant: theevaporation units.

Figure 1. Five evaporation units for the evaporation sectionin a beet sugar factory in Spain.

Each evaporator has two chambers. The heating chamber sur-rounds a set of vertical tubes that contain boiling juice. Aflow of steam enters these chambers and transfers heat to thejuice providing the energy needed for boiling. The steam con-denses around the tubes and leaves the evaporator as conden-sate. The interior tubes, plus the evaporator upper and bot-tom spaces, it is named the juice chamber. A sugar solutionof low concentration (juice) flows continuously into the baseof the evaporator and starts boiling. Consequently, we geta solution of higher concentration at the output. The steamproduced from the water evaporation reaches the upper spaceand leaves the juice chamber by a pipe at the top.

2.1. The simulation models

The simulated plant consists on a set of five effects intercon-nected through pipelines and valves. Each effect is formed byone or several evaporation units. The steam generated in oneeffect is used to provide energy to the heating chambers of theevaporators of the next effect, while the juice flows from oneeffect to another increasing the sugar concentration. In thismultiple-effect arrangement only the first effect is fed withboilers steam and purified juice. In the last effect, the evapo-rated steam escapes from the juice chamber to the condensersand then to atmosphere.

The use of dynamic modelling and simulation techniques inthe process industry is an activity mainly oriented towards thedesign of installations and the training of the working staff butit can be also used to test new control or diagnosis strategies.For this factory there is a training simulator developed at theUniversity of Valladolid, Spain (Merino, Alves, & Acebes,2005). The main console of the training simulator for theevaporation section is shown in Figure 2.

2


215


Figure 2. Schematic of the available simulator for training operators.

The simulator is articulated in two big systems: a simulationprogram and a distributed control system, where one of thecontrol units works as an instructor console. The objectiveof the simulation program is to reproduce in the most reli-able way the global dynamic performance of the process ofsugar production. The simulation is made using the Ecosim-Pro (EcosimPro, 2012) simulation language, and the model isdeveloped using libraries of elemental units developed in anobject oriented modelling approach (Acebes, Merino, Alves,& Prada, 2009). Additionally, the simulation code must workin real time and use an OPC (OLE for Process Control) in-terface (Alves, Normey-Rico, A., Acebes, & Prada, 2008) tocommunicate with the distributed control system. OPC is ade facto standard for communications on Windows applica-tions in industrial processes and it is included in almost everymodern SCADA. The OPC simulation program performs twotasks in parallel: solve the dynamic mathematical models ofthe process in real time, and attend requests from OPC clients.The SCADA system, that can be configured as operator or asinstructor console, acts as an OPC client, receives data fromthe simulation, changes the boundary conditions and activatefaults in the simulation program.

When building a model, the degree of complexity is variabledepending on the use of the model. In the case of evapora-tion units, it is possible to use different approximations to themodel (Luyben, 1990; Merino, 2008). In the training sim-

ulator, a detailed model is used, including dynamics in theliquid and vapour phase and complex phenomena such as ac-cumulation of incondensable gases or the absence of juice inthe evaporator, which allows steam flow via the juice pipes.These features are necessary in order to provide training ca-pabilities to the simulator. As an example of the type of equa-tions used in the simulator, energy balances to the juice cham-ber are shown:

dTjo

dt =Wjo(Hjo−Hjo)− ∂H

∂T Wji(Csi−Cso)

mt∂Hjo∂Tjo

−

E“HE−Hjo+

∂Hjo∂Cso

Cso

”mt

∂Hjo∂Tjo

Where:∂Hjo

∂Cso= −0.025104 · Tjs + 3.6939 · 10−5 · T 2

jo

∂Hjo

∂Tso= 4.06−0.025104·Cso+

(5.418936 · 10−4 · C2

so

)Tso

Together with these equations, mass balances, heat transmis-sion equations, state equations, etc., must be added for theliquid and vapour phases, resulting in a very complex non-linear model. Furthermore, in the physical model, there areseveral parameters that must be adjusted dynamically. In therelatively simple case studied in this article, only four param-eter were necessary. These were the set of equations that weneed to analyse and modify in order to perform model-based

3


216


diagnosis, to obtain from the model the value of a variablethat is being measured, so that it is possible to compare bothvalues. For this to occur, it is necessary that the number ofmeasured variables in the process is sufficiently large to al-low calculating one measured variable from other ones. Onthe other hand, there is no need of a match between the phys-ical causality of the modelled system and the causality im-posed by the availability of the measures. This involves thesymbolic manipulation of the mathematical model, which isusually complex, even when using object oriented modellinglanguages that allow non-causal modelling. For example, inthe case of evaporation, the juice level is a measured vari-able. This variable, from the point of view of the physicalmodelling, is a state variable that is calculated by numericalintegration. In the case of fault diagnosis, this variable is ameasured one that cannot be calculated by the model by inte-gration without appearing a high index problem. This makesit necessary to manipulate the model so that the present bind-ing disappears.

3. POSSIBLE CONFLICTS FOR STRUCTURAL MODELDECOMPOSITION

3.1. Possible Conflicts

The computation of the set of Possible Conflicts(PCs) (Pulido et al., 2001; Pulido & Alonso-Gonzalez,2004) is a system model decomposition method from the DXcommunity, which searches for the whole set of submodelsof a given model with minimal redundancy (the numberof equations in the submodel equals the set of unknownvariables plus one). PCs provide the minimal analyticalredundancy neccesary to perform fault diagnosis. PCs arecomputed off line and they can be used on line to performconsistency based diagnosis of dynamic systems. PCs alsoprovide the computational structure of the constraints thatgenerate redundancy. This structure can be used to build asimulation model, or –as we will show later– to obtain thestructure of a state space neural network.

Off-line PCs computation requires three steps:

1. To generate an abstract representation of the system asa hypergraph. The nodes of the hypergraph are sys-tem variables and the hyperarcs represent constraints be-tween these variables. These constraints are abstractedfrom the equations that relate system variables.

2. To derive Minimal Evaluation Chains (MECs), which areminimal connected over constrained subsystems. Theexistence of a MEC is a necessary condition for analyt-ical redundancy to exist. MECs have the potential to besolved using local propagation (solving one equation inone unknown) from the measurements.

3. To generate Minimal Evaluation Models (MEMs) as-

signing causality2 to the constraints of the MEC. MEMsare directed hypergraphs that specify the order in whichequations should be locally solved starting from mea-surements to generate the subsystem output.

In Consistency-based diagnosis (Reiter, 1987; Kleer &Williams, 1987) a conflict arises given a discrepancy betweenobserved and predicted values for a variable. Hence, conflictsare the result of the fault detection stage. But they also con-tain the necessary structural information for fault isolation.Possible Conflicts were designed to compute off-line thosesubsystems capable to become minimal conflicts online. Un-der fault conditions, conflicts are observed when the modeldescribed by a MEM is evaluated with available observations,because the model constraints and the input/measured valuesare inconsistent (Reiter, 1987; Kleer & Williams, 1987). Thisnotion leads to the definition of a Possible Conflict:

Definition 1 (Possible Conflict) The set of constraints in aMEC that give rise to at least one MEM.

Recent works have demonstrated the equivalence betweenMECs, Analytical Redundancy Relations (ARRs), and otherstructural model decomposition methods (Armengol et al.,2009).

3.2. Inclusion of temporal information in the models

There are two kinds of contraints in the model: Differentialconstraints, those used to model dynamic behaviour, and in-stantaneous constraints, those used to model static or instan-taneous relations between system variables.

Differential constraints represent a relation between a statevariable and its first derivative (x, dx

dt ). These constraintscan be used in the MEMs in two ways, depending on the se-lected causality assignment. In integral causality, constraint issolved as x(t) = x(t−1)+

∫ t

t−1x(t)dt. In derivative causal-

ity, x(t) = dxdt assumes that the derivative can be computed

based on present and past samples for x. Integral causal-ity usually implies using simulation, and it is the preferredapproach in the DX field. Derivative causality is the pre-ferred approach in the FDI approach. Both have been demon-strated to be equivalent for numerical models, assuming ad-equate sampling rates and precise approximations for deriva-tive computation are available, and assuming initial condi-tions for simulation are known (Chantler, Daus, Vikatos, &Coghill, 1996). PCs can easily handle both types of causality,since they only represent a different causal assignment whilebuilding MEMs (Pulido et al., 2010).

Special attention must be paid to loops in the MEM (set ofequations that must be solved simultaneously). Loops con-taining differential constraints in integral causality are al-

2In this context, by causality assignment we mean every possible way onevariable in one equation can be solved assuming the remaining variablesare known.

4


217


lowed, because under integral causality the time indices aredifferent to both sides of the differential constraint (Dressler,1994, 1996). It is generally accepted that loops contain-ing differential constraints in derivative causality can not besolved (Blanke et al., 2006).

Summarizing, each MEM for a PC represents how to buildan executable model to monitor the behaviour of the subsys-tem defined by the PC. Such executable model can be imple-mented as a simulation model or as a state-observer (Pulidoet al., 2010). However, building such model for complex non-linear systems it is not a trivial task. In Section 5 we will showthe set of PCs obtained for our case study, and we will derivea simulation model for one of the PCs. In subsection 5.3 wewill show how a grey-box model using neural networks canbe obtained for the same PC. Next section shows the fun-damentals for the type of neural network model used in thiswork.

4. STATE SPACE NEURAL NETWORKS FOR BEHAVIOURESTIMATION

State Space Neural Networks (ssNN) (Zamarreno & Vega,1998) is a great tool for modelling non-linear processes asshown in several cases (Gonzalez Lanza & Zamarreno, 2002;Zamarreno, Vega, Garcıa, & Francisco, 2000); even in thesugar industry (Zamarreno & Vega, 1997). Main advan-tages of such modelling approach are its ability for represent-ing any non-linear dynamics, and what is called a parallelmodel. This model represents the cause-effect process dy-namics without considering past inputs and/or past outputs.The dynamic relation is modelled by the state layer, whichcalculates the internal state of the network using just currentinputs of the model and internal state values from the previ-ous time step.

The architecture of the ssNN (see figure 3) consists of fiveblocks, and each block represents a neural network layer.From left to right, the number of neurons at each layer is n,h, s, h2 and m. The third layer represents the state of thesystem (the dynamics). As can be seen in the figure, there isa feedback from the state layer to the previous layer, whichmeans that the current state depends (in a non-linear way) onthe state at the previous time step. The second and fourthlayers model the non-linear behaviour: from the inputs to thestates and from the states to the outputs, respectively. Thefirst and the fifth layers provide linear transformations frominputs and to outputs, respectively. The ssNN is implementedby the following mathematical representation:

~x(t+ 1) = Wh · f1(W r · ~x(t) +W i · ~u(t) +Bh)

~y(t) = W o · f2(Wh2 · ~x(t) +Bh2)

where the parameters are weight matrices, W , and bias vec-tors, B:

• W i, Wh, W r, Wh2, W o are matrices with dimension

LINu

LINx

LINy

NL NL

BhBh2

W i Wh

W r

W o dWh2 d

LIN: Linear Processing Elements (Neurons)

NL: Non-Linear Processing Elements

Figure 3. Generic state space Neural Network architecture

h x n, s x h, h x s, h2 x s and m x h2, respectively.

• Bh and Bh2 are bias vectors with h and h2 elements re-spectively.

• f1 and f2 are two functions (non-linear, in general)which are applied elementwise to a vector or matrix.They are usually of sigmoid type.

For some processes, where some a priori knowledge aboutthe first principle equations can be obtained, a black boxmodel could be too generic to obtain good results. But thisknowledge can be used to restrict the architecture of themodel, so we end up with a grey box model that can be betteradjusted to mimic the process. Next section will illustrate thetraining process to obtain a specific grey box model for a PCrelated to an evaporation unit.

5. RESULTS ON THE CASE STUDY

5.1. PCs for the evaporation unit

As mentioned in Section 2.1 the evaporation section of thesugar factory is made up of five effects working sequentiallyto increase the sugar concentration in the syrup. All the evap-oration units in the same effect share the same steam outputconduit, and provide the steam for the next effect, thus par-tially coupling the behaviour of all the units. For our tests wehave focused on the first evaporation unit in the first effect.

There are several assumptions that must be made in order tosimplify the original model used in the training simulator,and to use those first principles equations for diagnosis. Inour case, we simplified the dynamic processes actually hap-pening inside the evaporation chamber, and we assumed thesystem was in only one operation mode. The dynamic pro-cesses considered in the evaporation unit were: conservationlaw for the amount of sugar and no-sugar products, globalbalance of matter in the evaporation chamber, sugar balance,level in the chamber, energy balances, steam volume balance,interchanged heat, and pressures in the chamber. As a resultof this simplification process, our model was made up of 40equations based on first principles of physics, 44 unknownvariables , and 12 measured variables. Only 5 of these equa-tions were used to model the evolution of 5 state variables:C, T , M , juice out.T , Mvh.

5


218


The algorithms used to compute the set of PCs provided 1058MECs, and 775 MEMs. The total number of PCs in this sys-tem were 237, but most of them shared the same fault iso-lation capabilities since only 8 of the original 40 equationsmodel relevant faulty behavior.

In the original model there are several equations containingpartial derivatives, and several non-linear functions. As a con-sequence, most of the generated MEMs can be hardly imple-mented, although it is analytically possible. The problem wefaced at that point was to implement the relevant MEMs, be-cause it would be necessary to write by hand each simulationmodel. The process needs to be supervised by the modellingexpert, thus producing a bottleneck in the development of thediagnosis system. As a consequence, to test the approach, wehave modelled only one of the PCs, PC195, whose MEM isgraphically described in Figure 4. The MEM is a directed hy-pergraph which represents how the equations must be used tocompute the output, steam out.P , using just measurementsas inputs, and how the inputs are used to compute the inter-mediate unknown variables. Each solid arc represent an in-stantaneous constraint. Each dashed arc represent a differen-tial constraint. In this system we use integral causality, henceeach dashed arc means that we must perform integration toobtain the value of the state variable.

We selected the subsystem because it contains 16 equations,several input measurements –juice in.W , juice in.Brix,juice out.T , and level juice.signal–, and several state-variables –M , C, and Mvh–. The observed output variable issteam out.P . Hence, it has enough complexity to be a goodtest for the state space neural model.

5.2. The experimental data-set

PC195 was implemented in EcosimPro (EcosimPro, 2012).We run a set of five experiments using real data from the fac-tory for an intermediate month in the five month campaign.Experiment 1 consisted of 900 data points taken from 9 mea-surements in the system every 30 seconds. Experiments 2, 3,4 and 5 consisted of 2800 data points taken from the same 9measurements every 30 seconds3. Data sets 1, 2, and 4 rep-resent nominal behaviour. Data set 3 represent a fault in theoutput sensor.

In order to monitor the nominal behavior and perform faultdetection, we empirically determined a threshold. Figure 7shows the performance of the model on the four scenarios. Itcan be seen that the simulation model is able to monitor thenominal behaviour and also to detect the fault, but as shownin experiments 2 and 3 the estimations obtained are not veryaccurate. This is due mainly to the assumptions made regard-ing unknown parameters and boundary conditions.

3Since the first experiment is shorter than the other four, we do not show itsresults.

5.3. State space neural network models for PCs

PC195 graphically specifies in Figure 4 the relations betweenthe inputs (juice level.signal, juice in.W , juice out.T ,and juice in.Brix) and the states (M , C, Mvh andsteam out.P ). Moreover, the last input (juice in.Brix) canbe considered constant along time, so it can be removed fromthe model. The output of the model is steam out.P , so thereis a direct relation between the output and one of the states.Taking this into account, the ssNN architecture can be cus-tomized to represent the process characteristics in a betterway, as described in Figure 5. The non-linear (hidden) layeris split into four parts, and each part (represented by NL in-side a square) has a number of neurons (h1, h2, h3, h4) thatmust be adjusted by trial and error to represent the nonlineardynamics of each state.

This simplified ssNN architecture can be viewed as removingsome of the weights between layers, or setting zeros in somespecific elements of the weight matrices (the matrices can beseen in figure 6). Dimension of matrices W i, W r, Wh, andW o is (h1 + h2 + h3 + h4)× 3, (h1 + h2 + h3 + h4)× 4,4× (h1 + h2 + h3 + h4), and 1× 4, respectively.

5.3.1. Training

Training is the process of modifying the parameters (weightsand bias) of the neural network to adjust its output to the pro-cess output. Error between the neural network output andprocess output has to be minimized, so the training procedureis an optimization task where some index, Sum Squared Error(SSE) in our case, has to be minimized.

A feedforward network is quite easy to train, using the back-propagation method or some of its variants. But a recurrentneural network (such as ssNN) is more difficult to train due tothe recurrent connections. Stochastic methods are an alterna-tive for this kind of neural network, which results in easier toimplement training algorithms. The Modified Random Opti-mization Method (Solis & Wets, 1981) has been selected inthis work, but with some modifications to improve conver-gence as shown in (Gonzalez-Lanza & Zamarreno, 2002).

For training this ssNN architecture, we used the experimentsexplained in Section 5.2. Experiments 1, 3 and 5 were thetraining set, and experiments 2 and 4 were used for validation.The only parameter to tune in this ad hoc ssNN is the numberof hidden neurons at the second layer. With 5 neurons at eachpart is enough to represent the data, thus a total of 20 sigmoidneurons.

Figure 8 shows the evolution of the estimated and measuredvariable for PC195 for the selected 4 experiments. To use thessNN model for fault detection, a new threshold was empiri-cally calculated. Again, the ssNN model is able to track thenominal behaviour (experiments 1, 2, and 4 in Figure 8), andto correctly detect the fault in experiment 3 when the residual

6


219


steam_out.P*

Pvh

Mvh steam_out.T

juice_out.T* level_juice.signal*

Vj

Mvh’

E steam_out.W

juice_out.T*juice_out.Brix Pvh juice_out.T

CPSV steam_out.p

C’

juice_in.W* Ci M

juice_in.Brix

EC

M’

juice_in.W* juice_out.W E

juice_out.P PSJ

juice_in.P

juice_out.Brix juice_out.P h

Vj

Figure 4. Minimal Evaluable Model schematics for Possible Conflict PC195. The estimated variable is Pvh. The correspondingmeasured variable is Steam out.P .

Figure 5. Simplified ssNN to better represent the model for the PC.7


220


W i =

i1,1 i1,2 i1,3

......

...ih1,1 ih1,2 ih1,3

0 ih1+1,2 0...

......

0 ih2,2 0

0 0 ih2+1,3

......

...0 i0 ih3,3

ih3+1,1 0 ih3+1,3

......

...ih4,1 0 ih4,3

W r =

r1,1 r1,2 r1,3 0

......

......

rh1,1 rh1,2 rh1,3 0

rh1+1,1 rh1+1,2 0 0

......

......

rh2,1 rh2,2 0 0

0 rh2+1,2 rh2+1,3 rh2+1,4

......

......

0 rh3,2 rh3,3 rh3,4

0 0 rh3+1,3 0...

......

...0 0 rh4,3 0

Wh =

d1,1 · · · d1,h10 · · · 00 · · · 00 · · · 0

0 · · · 0d2,h1+1 · · · d2,h2

0 · · · 00 · · · 0

0 · · · 00 · · · 0

d3,h2+1 · · · d3,h30 · · · 0

0 · · · 00 · · · 00 · · · 0

d4,h3+1 · · · d4,h4

W o =[0 0 0 1

]

Figure 6. Simplified weight matrices W i, W r, Wh, and W o for the ssNN implementing PC195.

exceeds the threshold.

Looking at results in Figures 7 and 8 we can see that bothmodels can be used to monitor the evolution of variableSteam out.P . Main difference comes from the bias intro-duced by the parameters in the first principles models in Fig. 7leading to a higher threshold for fault detection. Nevertheless,the model can still be used for monitoring and fault detection.

The ssNN model used only 3 experiments for training, andwas able to track the nominal behaviour more accurately, andit was capable to detect the fault in the sensor. However, moretraining is necessary considering data from different months.

6. CONCLUSIONS

In this work we have proposed to use Possible Conflicts todecompose a large system model into smaller models withminimal redundancy for fault detection and isolation. Pos-sible Conflicts provide the structural models (equations, in-puts, outputs, and state variables) required for model-basedfault detection and isolation, and these models can be imple-mented as simulation or state-observer. Since deriving suchmodels for complex non-linear models it is not straightfor-ward and requires the participation of modelling experts, wehave proposed to use the structural information in the modelto design a neural network grey box model using a state spacearchitecture.

8


221


0 500 1000 1500 2000 25000.6

0.8

1

1.2

1.4

1.6

Time

stea

mout.

P

MeasurementPC Estimation

0 500 1000 1500 2000 25000

0.1

0.2

0.3

0.4

0.5

Time

Res

idual

ResidualThreshold

0 500 1000 1500 2000 25000.6

0.8

1

1.2

1.4

1.6

Time

stea

mout.

P


0 500 1000 1500 2000 25000

0.1

0.2

0.3

0.4

0.5

Time

Res

idual

ResidualThreshold

0 500 1000 1500 2000 25000.6

0.8

1

1.2

1.4

1.6

Time

stea

mout.

P


0 500 1000 1500 2000 25000

0.1

0.2

0.3

0.4

0.5

Time

Res

idual

ResidualThreshold

0 500 1000 1500 2000 25000.6

0.8

1

1.2

1.4

1.6

Time

stea

mout.

P


0 500 1000 1500 2000 25000

0.1

0.2

0.3

0.4

0.5

Time

Res

idual

ResidualThreshold

Figure 7. Results for the PC tracking the system using the first principles model. The figure represent 4 experiments with realdata. On the left we represent the estimated and the real value of the magnitude. On the right we represent the evolution of theresidual, and the threshold.

0 500 1000 1500 2000 25000.6

0.8

1

1.2

1.4

1.6

Time

stea

mout.

P


0 500 1000 1500 2000 25000

0.1

0.2

0.3

0.4

0.5

Time

Res

idual

ResidualThreshold

0 500 1000 1500 2000 25000.6

0.8

1

1.2

1.4

1.6

Time

stea

mout.

P


0 500 1000 1500 2000 25000

0.1

0.2

0.3

0.4

0.5

Time

Res

idual

ResidualThreshold

0 500 1000 1500 2000 25000.6

0.8

1

1.2

1.4

1.6

Time

stea

mout.

P


0 500 1000 1500 2000 25000

0.1

0.2

0.3

0.4

0.5

Time

Res

idual

ResidualThreshold

0 500 1000 1500 2000 25000.6

0.8

1

1.2

1.4

1.6

Time

stea

mout.

P


0 500 1000 1500 2000 25000

0.1

0.2

0.3

0.4

0.5

Time

Res

idual

ResidualThreshold

Figure 8. Results for the PC tracking the system using the ssNN model. The figure represent 4 experiments with real data. Onthe left we represent the estimated and the real value of the magnitude. On the right we represent the evolution of the residual,and the threshold.

9


222


The main conclusion is that the structure of the MinimalEvaluable Model for a Possible Conflict can guide the designof the state space model of the neural network, reducing itscomplexity and avoiding the process of multiple unknown pa-rameter estimation in the first principles models. Comparingresults of this approach in an evaporation unit of a beet sugarfactory we have observed that the ssNN is able to obtain sim-ilar of even better results than a simulation model manuallyderived by an expert. Both types of models were used to suc-cessfully monitor the process and to detect faults.

As further work, we plan to derive additional ssNNs and totest on a larger experiment data-set. Additionally, we needto test the approach at different times of the season, becausethis is a very slow evolving process whose parameters varyover time. Moreover, we can test more abstract models thatwill produce fewer PCs, but still containing same structuralinformation. Finally, once we introduce larger data sets, wewill use statistical tests to perform fault detection, and to de-termine the threshold to guarantee a maximum percentage offalse positives and false negatives.

ACKNOWLEDGMENT

This work has been supported by Spanish MCI grantsTIN2009-11326 and DPI2009-14410-C02-02. We wouldalso want to thank personnel from “Azucarera Espanola” forthe data provided for these experiments, and three anonymousreviewers for their comments that have help us to improve thispaper.

REFERENCES

Acebes, L., Merino, A., Alves, R., & Prada, C. de. (2009).Online energy diagnosis of sugar plants (in Spanish inthe original). RIAI - Revista Iberoamericana de Auto-matica e Informatica Industrial, 6(3), 68-75.

Alves, R., Normey-Rico, J., A., M., Acebes, L., & Prada,C. de. (2008, June). Distributed Continuous ProcessSimulation: An Industrial Case Study. Computers andChemical Engineering, 32(6), 1203-1213.

Armengol, J., Bregon, A., Escobet, T., Gelso, E., Krysander,M., Nyberg, M., et al. (2009). Minimal StructurallyOverdetermined sets for residual generation: A com-parison of alternative approaches. In Proceedings of the7th IFAC Symposium on Fault Detection, Supervisionand Safety of Technical Processes, SAFEPROCESS09(p. 1480-1485). Barcelona, Spain.

Blanke, M., Kinnaert, M., Lunze, J., & Staroswiecki,M. (2006). Diagnosis and Fault-Tolerant Control.Springer.

Chantler, M., Daus, S., Vikatos, T., & Coghill, G. (1996). Theuse of quantitative dynamic models and dependencyrecording engines. In Proceedings of the Seventh Inter-national Workshop on Principles of Diagnosis, DX96

(p. 59-68). Val Morin, Quebec, Canada.Cordier, M., Dague, P., Levy, F., Montmain, J., & Trave-

Massuyes, M. S. L. (2004). Conflicts versus Analyt-ical Redundancy Relations: a comparativeanalysis ofthe Model-based Diagnosis approach from the Artifi-cial Intelligence and Automatic Control perspectives.IEEE Trans. on Systems, Man, and Cybernetics. PartB: Cybernetics, 34(5), 2163-2177.

Daigle, M., Bregon, A., & Roychoudhury, I. (2011, Septem-ber). Distributed Damage Estimation for PrognosticsBased on Structural Model Decomposition. In Pro-ceedings of the Annual Conference of the Prognosticsand Health Management Society 2011 (p. 198-208).

Dressler, O. (1994). Model-based Diagnosis on Board:Magellan-MT Inside. In Working Notes of the Inter-national Workshop on Principles of Diagnosis, DX94.Goslar, Germany.

Dressler, O. (1996). On-line diagnosis and monitoringof dynamic systems based on qualitativemodels anddependency-recording diagnosis engines. In Proceed-ings of the Twelfth European Conference on ArtificialIntelligence, ECAI-96 (p. 461-465). John Wiley andSons, New York.

Dressler, O., & Struss, P. (1996). The Consistency-based approach to automated diagnosis of devices. InG. Brewka (Ed.), Principles of Knowledge Representa-tion (p. 269-314). CSLI Publications, Standford.

Empresarios Agrupados Internacional. (2012). EcosimPro.http://www.ecosimpro.com/. Madrid, Spain.

Gertler, J. (1998). Fault detection and diagnosis in Engineer-ing Systems. Marcel Dekker, Inc., Basel.

Gonzalez-Lanza, P., & Zamarreno, J. (2002, january). Ahybrid method for training a feedback neural network.In First International ICSC-NAISO Congress on NeuroFuzzy Technologies NF 2002. Havana - Cuba.

Gonzalez Lanza, P., & Zamarreno, J. (2002). A short-termtemperature forecaster based on a state space neuralnetwork. Engineering Applications of Artificial Intelli-gence, 15(5), 459 - 464.

Hamscher, W., Console, L., & Kleer (Eds.), J. de. (1992).Readings in Model based Diagnosis. Morgan-Kaufmann Pub., San Mateo.

Kleer, J. de, & Williams, B. (1987). Diagnosing multiplefaults. Artificial Intelligente, 32, 97-130.

Luyben, W. (1990). Process modeling, simulation, and con-trol for chemical engineers. McGraw-Hill.

Merino, A. (2008). Librerıa de modelos del cuarto de remo-lacha de una industria azucarera para un simuladorde entrenamiento de operarios. Unpublished doctoraldissertation, Universidad de Valladolid.

Merino, A., Alves, R., & Acebes, L. (2005). A training sim-ulator for the evaporation section of a beet sugar pro-duction process. In Proceedings of the 2005 EuropeanSimulation and Modelling conference.

10


223


Patton, R. J., Frank, P. M., & Clark, R. N. (2000). Issues infault diagnosis for dynamic systems. Springer Verlag,New York.

Pulido, B., & Alonso-Gonzalez, C. (2004). Possible conflicts:a compilation technique for consistency-based diagno-sis. ”IEEE Trans. on Systems, Man, and Cybernetics.Part B: Cybernetics”, 34(5), 2192-2206.

Pulido, B., Alonso-Gonzalez, C., & Acebes, F. (2001).Consistency-based diagnosis of dynamic systems us-ing quantitative models and off-line dependency-recording. In 12th International Workshop on Princi-ples of Diagnosis (DX-01) (p. 175-182). Sansicario,Italy.

Pulido, B., Bregon, A., & Alonso-Gonzalez, C. (2010).Analyzing the influence of differential constraints inPossible Conflict and ARR computation. In CurrentTopics in Artficial Intelligence, CAEPIA 2009 SelectedPapers. P. Meseguer, L. Mandow, R. M. Gasca Eds.Springer-Verlag Berlin.

Reiter, R. (1987). A Theory of Diagnosis from First Princi-ples. Artificial Intelligence, 32, 57-95.

Solis, F., & Wets, R. J.-B. (1981). Minimization by Ran-dom Search Techniques. Mathematics of OperationsResearch, 6, 19–30.

Zamarreno, J., & Vega, P. (1997). Identification and predic-tive control of a melter unit used in the sugar industry.Artificial Intelligence in Engineering, 11(4), 365 - 373.

Zamarreno, J., & Vega, P. (1998). State space neural network.Properties and application. Neural Networks, 11(6),1099–1112.

Zamarreno, J., Vega, P., Garcıa, L., & Francisco, M. (2000).State-space neural network for modelling, predictionand control. Control Engineering Practice, 8(9), 1063- 1075.

BIOGRAPHIES

Belarmino Pulido received his Licenciate degree, M.Sc. de-gree, and Ph.D. degree in Computer Science from the Uni-versity of Valladolid, Valladolid, Spain, in 1992, 1995, and2001 respectively. In 1994 he joined the Dept. of Computer

Science at the University of Valladolid, where he is AssociateProfessor since 2002. His main research interests are Model-based and Knowledge-based reasoning, for Supervision andDiagnosis. He has worked in several national and Europeanfunded projects related with Supervision and Diagnosis, andis the coordinator of the Spanish Network on Supervision andDiagnosis of Complex Systems since 2005.

Jesus Maria Zamarreno has a degree in Physics, and PhDin Physics from the University of Valladolid, Spain, where heis Lecturer (Associate Professor). He is with the Dept. ofSystem Engineering and Automatic Control. He is a memberof the IFAC Spanish section CEA, and also is Advisor of theISA student section at Valladolid. His research interests areArtificial Neural Networks, Agent-based modelling, Model-based predictive control, and OPC Applications.

Alejandro Merino received has a degree in Chemical Engi-neering and M.Sc. and PhD degree in Process and SystemsEngineering from the University of Valladolid, Spain. He iscurrently Assistant Professor at University of Burgos, Spain,and also senior researcher at the Centre of Sugar Technology.He has worked in different projects related to modeling andoptimization of complex industrial processes. His researchinterest is modelling and optimization of dynamic processes.

Anibal Bregon received his B.Sc., M.Sc., and Ph.D. degreesin Computer Science from the University of Valladolid, Val-ladolid, Spain, in 2005, 2007, and 2010, respectively. Cur-rently he is Assistant Professor and Research Assistant at theDepartment of Computer Science from the University of Val-ladolid. From September 2005 to June 2010, he was GraduateResearch Assistant with the Intelligent Systems Group at theUniversity of Valladolid, Spain. During that time he was vis-iting scholar at the Institute for Software Integrated Systems,Vanderbilt University, Nashville, TN, USA; the Dept. ofElectrical Engineering, Linkoeping University, Linkoeping,Sweden; and the Diagnostics and Prognostics Group, NASAAmes Research Center, Mountain View, CA, USA. His cur-rent research interests include model-based reasoning for di-agnosis, prognostics, health-management, and distributed di-agnosis of complex physical systems.

11


224

Virtual Framework for Validation and Verification of System Design Requirements to enable Condition Based Maintenance

Dipl.-Ing. Heiko Mikat1, Dr. Dipl.-Ing. Antonino Marco Siddiolo2, Dipl.-Ing. Matthias Buderath3

1,2,3Cassidian, Manching, 85077 Germany


[email protected]

ABSTRACT

During the last decade Condition Based Maintenance [CBM] became an important area of interest to reduce maintenance and logistic delays related down times and improve system effectiveness. Reliable diagnostic and prognostic capabilities that can identify and predict incipient failures are required to enable such a maintenance concept. For a successful integration of CBM into a system, the challenge beyond the development of suitable algorithms and monitoring concepts is also to validate and verify the appropriate design requirements. To justify additional investments into such a design approach it is also important to understand the benefits of the CBM solution. Throughout this paper we will define a framework that can be used to support the Validation & Verification [V&V] process for a CBM system in a virtual environment. The proposed framework can be tailored to any type of system design. It will be shown that an implementation of failure prediction capabilities can significantly improve the desired system performance outcomes and reduce the risk for resource management; on the other hand an enhanced online monitoring system without prognostics has only a limited potential to ensure the return on investment for developing and integrating such technologies. A case study for a hydraulic pump module will be carried out to illustrate the concept.

1. INTRODUCTION

A maintenance strategy cannot change the reliability figures of a system design but an optimized concept can improve availability and reduce operation and support costs (Reimann, Kacprzynski, Cabral, and Marini, 2009). Three maintenance strategies and measures to overcome the issues associated with operating a system with non-infinite reliability can be distinguished.

Strategy Measure

Run To Failure

Mainten. [RTFM]

On Condition Maintenance

[OCM]

Condition Based Maintenance

[CBM]

Corrective Maintenance

[CM]

General concept for RTFM

Failures which can cause neither a safety nor an economical critical event

Failures which can cause neither a safety nor an eco-nomical critical event. Requires online monitoring for fault isolation.

Preventive Maintenance

[PvM]

Not included

Failures which are safety or economical critical. Fixed intervals to decide if a PvM is required.

Failures which are safety or eco-nomical critical w/o prognostics. Requires online monitoring to enable dynamic intervals for PvM.

Predictive Maintenance

[PdM]

Not included

Not included

Failures which are safety or eco-nomical critical with monitoring and prognostics. Enables dynamic intervals to plan and perform PdM when required.

Table 1. Maintenance strategies and measures

A definition for the different concepts that will be used in the proposed framework is given in Table 1.

Standardized methods like Failure Mode Effects and Criticality Analysis (FMECA) or Common Mode Analysis are used to allocate probabilities and criticalities to each single failure mode in a system. The results are used to decide which failures are acceptable during operation and which ones have to be avoided through the introduction of a PvM or in case of a CBM concept, for which components it is expedient to develop capabilities to enable PdM. Monitoring or prediction methods to support the decision whether a PvM or PdM is required will always be imperfect. This will cause erroneous replacements of healthy components (known as No Fault Found [NFF]) and a waste of useful life by too early replacements of degrading components.

_____________________ H. Mikat et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


225


2

Figure 1. Enhanced Health Monitoring concept

Especially in the case of PdM, where a potential failure or degradation should be announced while the component still operates within the specified performance limits, the avoidance of NFFs and simultaneous realization of a high sensitivity to incipient failures is a challenge. For the realization of a dynamic scheduling of maintenance intervals, it is necessary to realize online condition monitoring to receive and process all information to decide when a PvM or PdM action is required. If the different components and the system itself are not designed to provide and process all required information, it is not possible to realize an optimized CBM concept (Dunsdon & Harrington, 2009). For this reason it is mandatory to establish all relevant requirements from the beginning of the system design phase. These requirements cannot be treated like general design requirements related to Maintainability or Testability aspects. Whereas a Build-In Test [BIT] can be specified through a fault isolation and NFF rate, a CBM system would also need the specification and verification of detecting failures before they occur and predicting future trends with a verifiable accuracy. The difference between BIT and an Enhanced Health Monitoring [EnHM] concept is illustrated in Figure 1.

Especially if the CBM system shall not only support the optimization of spares and personnel management but also be designed to shift scheduled intervals - which are important to ensure system safety aspects - into dynamic condition-based intervals, it is of high relevance to ensure traceability of how the CBM capabilities needs to be incorporated into the system design. Selected Key Performance Indicators [KPIs] can be defined to represent customer requirements or industrial interests. An understanding of how CBM affects these KPIs is needed to justify increased development and procurement costs plus a more complex system design.

Figure 2. Hierarchical structure of the framework

The general hierarchical structure of how a Service Capability Rate [SCR] can be derived from the design and support elements of a system is shown in Figure 2. This architecture is used for the definition of the framework that will be described throughout this paper.

A SCR can vary from a success rate for performing reconnaissance missions in the field of the military aviation over transporting passengers or material for the civil sector to producing any type of goods in the industrial sector. The baseline parameters are Reliability, Maintainability and Testability [RMT], specifying how many and when any failure events are expected, how counter measures can be realized and which fault isolation capabilities are provided. The logistic concept [LOG] provides information on how resources like personnel, spares and consumables are supported. The maintenance strategy [MNT] specifies how the scheduled and unscheduled events are managed. The concept for Enhanced Health Management [EHM] has been introduced to specify the potential for the realization of CBM through EnHM and prognostics.

These baseline elements are considered as design and support elements of the system. The next level, as an outcome of the design and support level, is considered as Life Cycle Costs [LCC] related. The Mean Waiting Time [MWT] denotes how much time is lost due to waiting for missing resources; therefore it is related to periods during which the system cannot generate profit. The Maintenance Index [MID] indicates how much maintenance effort is required in Maintenance Man Hours [MMH] per Operational Hour [OH]. The Inverse Logistics Maintenance Ratio [ILMR] is used to quantify the amount of unscheduled events per OH, hence indicating the required capacity for spares to ensure the operational availability of the system. Based on these parameters and the system specific operational scenario, various KPIs can be derived. Important parameters are the operational availability of the material required to support the system for fulfilling its service aims [A0MAT] and the operational availability of the system itself [A0SYS]; these two parameters can be used to trace customer requirements and derive the SCR parameter. The required material can again be anything that is needed to support the system specific service task, like payload equipment for aircraft missions or industrial goods for production purposes.

The following sections will give an overview of a generic framework, addressing all above mentioned aspects by describing the conceptual design and purpose of the framework as well as basic assumptions and definitions.

The framework described on the following pages can be understood as a multifunctional environment, providing the capability to validate design and conceptual requirements as well as a tool for an integrated simulation concept of various modules composed to a complex system architecture for verification purposes. The general idea is shown in Figure 3.


226


3

Figure 3. V-Model for framework applications

The modus as "Virtual Validation Environment" enables the derivation and validation of dedicated requirements for a system layout and EHM integration. Furthermore the "Integrated Simulation" modus supports model-based verification of KPIs and EHM requirements through the integration of validated simulation modules for diagnostics and prognostics on component or subsystem level.

To demonstrate the concept we will describe the simulation framework and conduct a case study. The case study will be carried out by showing how a simulation module for monitoring the status of a hydraulic pump could be integrated into the simulation environment and support the verification of RMT and EHM requirements.

2. DESCRIPTION OF THE SIMULATION CONCEPT

The main aim of the work presented in this paper is to develop a simulation environment that can be used to perform trade-off studies for system design and maintenance concept aspects emphasizing the capability to include the evaluation of a CBM potential. As described in the introduction, we will distinguish between three different maintenance strategies and measures. As the framework has originally been developed to support aircraft design decisions, where - due to safety and economic reasons - RTFM shall be avoided, the RTFM strategy has been excluded. This assumption would also be valid for other complex or cost intensive applications like passenger transportation or industrial facilities. The decision tree which has been defined as basis for the framework is shown below.

Figure 4. RMT, MNT and EHM Flowchart

2.1. Maintenance Parameters

According to the online monitoring capabilities, subsets of the primary failures specified by RMT will belong to the OCM or the CBM branch. A further partitioning into the different measures depends on the monitoring capabilities and definition of fixed maintenance intervals for inspection and overhaul. The probability that a failure belongs to one class is defined by the probability allocation parameter:

∑

∑=Ρ

ii

jj

j λ

λ

(1)

In the case of PPREDC (Predictive - CBM) the index j would denote all failure modes belonging to the class "Predictive Measures", while the index i would describe the sum of all failure modes belonging to the class "CBM Measures". It has been assumed that in excess of the primary failures classified by CM, PvM or PdM, each system also generates a number of false alarms (FA). As PvM and PdM would avoid the occurrence of a failure during service, the "Corrective Measures" are the only classes which generate additional secondary faults (SFLT) with the probability PSFLT. For the overall simulation it should be considered that each maintenance action will also cause a secondary maintenance (SMNT) induced failure (defined by the probability PSMNT). These maintenance induced failures can be mishandling, wrong installation or other secondary damages during overhaul and replacement or repair activities on the system (Byer, Hess, and Fila 2001). As each PvM and PdM should avoid the occurrence of a failure, it has to be performed before the failure happens. That means the introduction of such a measure would reduce the useful life of the system or component. This aspect has been introduced as additional probability for erroneous early replacements of the respective part. Due to the online monitoring of the CBM concept, this error will be lower for the PvM measures in the CBM branch than for those in the OCM part. Also it can be assumed that the evaluation of the information for PdM enables a much higher accuracy and confidence on estimating the optimum time to replace the monitored component than the monitoring without prognostics. Hence the waste of useful life for PdM can be considered to be lower than for PvM measures (Spare, 2001).

2.2. Reliability, Maintainability and Testability

The top level failure rate distribution is given by the RMT requirements as composition of all individual primary failure modes of the system. The probability for additional false alarms has been introduced as percentage false alarm rate for the respective class of events. It should be noted that - for maintainability aspects - each failure mode has been treated as individual event requiring a maintenance action. The maintainability aspect is described by the Mean Time To Repair for each individual failure mode MTTRi.


227


4

Knowing the individual failure rates, a joint value on system level can be derived:

∑

∑ ⋅=

ii

iii

SYS

MTTR

MTTRλ

λ

(2)

A common approach for complex applications like aircrafts is to define a BIT failure isolation rate, specified through the capability to isolate single point failures to one or multiple root causes. It is assumed that CBM monitored components will have an ideal fault isolation capability, reducing the number of potential candidates for a single point failure to one single source. Considering this assumption and the fact that fault isolation for BIT monitored equipment has only to be performed once and the subsequent troubleshooting process for identifying the correct failure source would only include multiples of the replacement and checkout time for individual components, a formula for the resulting MTTR considering imperfect fault isolation can be derived (fdi: fault detection and isolation):

)1()ˆ( fdiCBMOCMPREVOCORRO pMTTR δ−⋅Ρ+Ρ⋅Ρ+⋅Ρ=∆

SYSRES MTTRMTTRMTTR ⋅∆=

(3)

with:

)1()()1(ˆ:2

)1(1 nkk fdink

fdifdifdifdi pkpppp −+⋅−⋅−⋅+= ∑=

−δ

where pfdik indicates the probability to isolate a single point failure to k = 2, … n sources as testability requirement and δfdi as fraction of the replacement time required to perform the fault isolation. The imperfect BIT fault isolation will not only affect the repair time but also the resulting maintenance effort. Hence, calculation of the increased probability for maintenance induced failures in the corrective class of the OCM branch is implemented accordingly (δfdi = 0):

pOCM SMNTCORRSMNT ˆ)( ⋅Ρ=Ρ (4)

2.3. Logistic Parameters

The main parameter within the scope of a logistic concept for estimation of system availability is the mean delay time for unscheduled events. This value is composed of an administrative and a logistic delay [Mean Logistics Delay Time: MLDT] fraction giving an average parameter for the MWT. The MLDT parameter can be derived from the probability density estimate for the resulting failure rate of unscheduled events. Using these assumptions an estimate of the MLDT can be derived:

0

:1

:

max

max

)(

5,0)()(

Tpdf

Tpdf

MLDT

iusiusi

iLeadsusiusi

s +⋅

⋅⋅−⋅=

∑

∑

=

=

λ

λλ

λλ

λλλ

(5)

with (excluding secondary effects, which are added to receive the resulting unscheduled failure rate):

)]()1([ PREVCCORRCFACCBMFAOOCMSysus Ρ+Ρ+Ρ⋅Ρ+Ρ+⋅Ρ⋅=λλ

maxmax ),1)(( λλλλλ ⋅=== pfrcdf sus

and λSys as overall system failure rate, λus as resulting failure rate for all unscheduled events, pdf(λus) / cdf(λus) as probability density / cumulative distribution function of λus, pfr as fill rate factor of spares in the operational scenario with pfr = 1 for nSpares(λmax), TLead as the maintenance related lead time (time between two spares deliveries or mean waiting time on maintenance specialists) and T0 as the administrative delay time. Each element belonging to class other than PdM is treated as unscheduled event, while it is assumed that the capability to predict the occurrence of an event shifts it from being unscheduled to a scheduled maintenance. An arbitrary MLDT variation as a function of the spares fill rate is shown in Figure 5.

Figure 5. Mean Logistic Delay Time variation

The resulting MWT is the weighted average for scheduled and unscheduled events:

Sys

ususSys MLDTTMWT

λλλλ ⋅+⋅−

= 0)( (6)

If PdM enables an accurate prediction of the time to failure, it can be assumed that the uncertainties for this class are reduced. This idea should reflect system operation without the need to consider a conservative assumption about the number of spares needed to maintain the system operational.

2.4. Enhanced Health Management Parameters

The EHM parameter set can be described through the values of PCBM, PPREDC and PFAC. It should be noted that the framework implies that only an EHM monitored failure can also be predicted. It is also assumed that false alarms caused by other means of monitoring are ignored if the EHM algorithm for the respective failure mode does not confirm the failure. As EHM requires a deeper knowledge of the system it cannot be assumed that this approach works also in the opposite direction, ignoring a false alarm of an EHM monitored component if other monitoring features are not confirming the failure.


228


5

The accuracy of prediction has been identified as one key design parameter for the development of prognostic algorithms and concepts (Saxena, Roychoudhury, Celaya, Saha, Saha, and Goebel, 2010). The following assumptions have been made for the derivation of accuracy and precision; these will result in a probability for too early or missed replacements and can be used as requirements for the development of suitable algorithms:

- The prediction horizon has to ensure failures do not appear during the lead time. The lead time can be a time of continuous operation, the time interval between two spare deliveries or until maintenance specialists will be available.

- The prediction error ε is always a function of the prediction accuracy θ and the expected lead time TLead:

LeadT⋅−=

2

21

θθε (7)

- The minimum required prediction horizon Ph is defined accordingly:

Leadh T⋅=Ρ 2

1θ

(8)

Assuming a fixed accuracy θ, it can be concluded that a replacement of the degrading component at tRep = θ·tPred

would avoid the failure with the probability specified by θ. Considering the mean and minima/maxima prediction regimes with an accuracy θ, the following relations for the respective waste of useful life EWULi can be derived:

Conservative ε=ΕWULMax

(9) Optimal 21

1

θθε

−−⋅=ΕWULMean

Opportunistic 21

)2(1

θθθε

−⋅−−⋅=ΕWULMin

Figure 6 depicts these regimes for θ = 90%. Assuming the conservative situation that all regimes can occur with the same probability, it can be concluded that the average waste of useful life is equal to ΕWUL = ΕWULMean.

- The resulting waste of useful life due to predictive maintenance is a function of the respective failure rate:

iWULii λλ ⋅Ε=∆ (10)

Figure 6. Prediction error regimes

2.5. Derivation of Performance Parameters

The system performance parameters can be derived according to Eq. (11), (12) (excluding scheduled overhauls):

)(1

10

iiii MWTMTTR

A+⋅+

=λ (11)

SYSMAT AASCR 00 ⋅=

(12)

with λi as overall failure rate.

2.6. Uncertainty Representation

As the aim of this work was to develop a framework that has not to rely on pseudo-empirical simulation results, it was required to find closed form solutions for all stochastic processes that are used in the model. Therefore all distribution parameters like mean and variances have been propagated through the model by assuming stochastic independence for all single failure modes and a stochastic correlation of all failure modes that are interdependent.

Assuming weibull distributed time to failures with unitary shape parameter and therefore a constant failure rate (design and manufacturing processes should ensure constant failure rates but due to varying conditions and tolerances the results are usually distributed), we can derive the expression for the propagation of the uncorrelated parameters PUC from class j belonging to branch i:

∑

∑=Ρ

ii

jj

UCj 2

2

λ

λ

(13)

The equivalent parameter for correlated events PC can be derived as:

2jUCiCj Ρ⋅Ρ=Ρ (14)

with Pj as the probability allocation parameter of event j caused by event i.

All primary failure rates can be treated as independent events with a covariance of cov(zi,zj) ≈ 0. Only for merging the resulting primary with the secondary and maintenance induced failures, the respective covariances have to be taken into account. The secondary failures will only occur due to a primary failure belonging to the class “Corrective Measures”; a maintenance induced failure will only occur due to a previous event belonging to any class of the OCM or CBM branch. Moreover the relative increase in the failure rate of primary events will cause the same relative increase in the rate of secondary events. These relations motivated to imply a perfect linear correlation for these two scenarios to derive the respective covariance:

)(),cov( ijji zVarzz ⋅Ρ= (15)


229


6

with Pj as probability allocation parameter of event j caused by event i.

Well known laws for the calculation with stochastic variables have been used to propagate all mean and variance parameters through the system model (Elandt-Johnson & Johnson, 1980; Stuart & Ord, 1998; Blumenfeld, 2001).

By applying these rules, we obtain the resulting distribution functions that will be used to estimate the distributions for the parameters MWT, ILMR and MID. As the maintenance effort is independent from logistic delays, they are again treated as independent variables, providing the basis to calculate the resulting distributions of A0i.

The specific distributions for the various parameters that have been used in the framework are listed in Table 2. Near real-time capable maximum likelihood estimators have been implemented into the simulation to estimate the distribution parameters by using the propagated expectation and variance of each stochastic variable as input.

Arbitrary simulations with random number distributions instead of the closed form solution for an OCM and CBM concept have been carried out to validate the concept. It can be seen that the results are sufficient accurate to assume the environment can be used to simulate processes with stochastic variables in a closed form solution (see Figure 7).

Failure rates: Two-parametric weibull distribution with constant failure rate

False alarms: Lognormal distribution Prediction Error: Lognormal distribution MWT: Lognormal distribution MTTR: Lognormal distribution ILMR: Two-parametric weibull distribution MID: Lognormal distribution A0: Two-parametric weibull distribution

Table 2. Parameter distribution type

Figure 7. Monte-Carlo validation

3. APPLICATION AS VIRTUAL VALIDATION ENVIRONMENT

The validation process is mainly based on a bottom-up and top-down justification and traceability analysis of all system design requirements. The idea for supporting this concept by utilizing the proposed framework is shown in Figure 8. The validation is performed by tracing all failure mode specific EHM requirements to the top level system requirements. The parameter CBMR comprises all EHM features. It is composed of the diagnostic [HMC] and the prognostic [FPC] part. Prognostic accuracy [PA], and prognostic coverage [PC] are used to describe the resulting FPC. The HMC is defined by the detection rate [DR] and false alarm rate [FAR]. The traceability to component level design requirements for hardware and software development is realized according to Eq. (2) by using the respective failure rates as weighting factors.

The following sections will give an overview of how a trade-off study could look like. A simplified cost-benefit approach will be discussed. More complex applications to find the optimum solution involving multiple cost functions will be the scope of future activities. Two arbitrary simulation runs have been conducted to illustrate and discuss the application as virtual validation environment. The first scenario simulates different design solutions for CBM without any PdM, only improving the fault isolation capabilities and conditional awareness of the system. The second scenario uses the same system design as baseline and evaluates a CBM concept with an integrated PdM capability, enabling the full potential of CBM.

This comparison should help to understand the impact of diagnostic and prognostic approaches on the three selected parameters SCR, MID and ILMR and if any saving potentials can be identified. It has to be noted that the results will vary if the logistic or maintainability parameters are modified; nevertheless the shown cases will provide sufficient information to discuss the main aspects. In the following discussion, the variance of each parameter can be understood as a factor describing the individual risk while the expectation value represents the potential to fulfil operational objectives.

Figure 8. EHM validation


230


7

3.1. EHM without Prediction Capabilities

For this study, the parameter "CBM Capability" quantifies the online monitoring features without predicting any future trends. From the results presented in Figure 9 it can be seen that the implementation of EnHM without simultaneous development of prediction capabilities can mainly improve the MID, hence reducing the maintenance effort per OH. This observation can be explained with the improved fault isolation and optimized preventive maintenance due to the online monitoring capabilities of EHM. The reduction of MMH/OH will also ensure an improvement in the resulting SCR of the system; however since all failure events are still unscheduled, this improvement will not be the same as for a fully integrated CBM system with PdM. This effect can also be seen in the almost unaffected trend of ILMR. The minor improvement in ILMR is due to the reduced number of false alarms for a redundant monitoring concept using a fusion of BIT and EHM for status assessments and the optimized preventive maintenance methods.

As a result it can be concluded that enhanced diagnostics without prognostics will mainly reduce the maintenance effort expectation and variance. While the reduced expectation value corresponds to less maintenance activities per OH, indicates the reduced variance a potential for a better scheduling of resources and manpower. The increase in the SCR expectation is a side effect of the improvement seen in the MID.

Figure 9. Sensitivity study EHM without PdM

3.2. EHM with Prediction Capabilities

By performing the same simulation as before with a CBM system including prediction capabilities for all monitored failure modes (now "CBM Capability" represents the quantity of failures that are monitored and can be predicted), the PdM concept reveals itself with its full potential. The implementation of prognostics has a significant impact on all three parameters by optimizing the expectation value and reducing the respective variance (see Figure 10).

Figure 10. Sensitivity study EHM with PdM

The potential to move unscheduled events into a scheduled scenario, without the need to incorporate all uncertainties associated with a system that enters service, reduces the risk for all parameters.

The improved SCR expectation trend is mainly related to the avoidance of secondary failures, the reduced waste of useful life for PdM in comparison to PvM, the improvement for fault isolation of the predicted failures and the planning for a PdM measure before the failure occurs. The prediction of all events belonging to the class PdM has reduced the MWT to the fraction of the administrative delay time that is not allocated to the provision of spare parts and consumables. Simultaneously, the number of unscheduled events per OH is reduced, providing the potential to save costs for producing and storing spare parts before they are needed. The further improvement in the characteristics of the MID compared to the previous simulation without PdM can be explained with the reduction of the overall variance in the primary failure events and the avoidance of secondary failures by replacing the monitored item before a failure occurs.

3.3. Discussion of Results

By comparing the results for EHM with and without PdM it can be conducted that the enhanced health monitoring without prognosis may not compensate the investment needed for the development, production and operation of the health monitoring system. The minor improvement in the SCR due to the optimized trouble shooting process through online monitoring without reducing the risk, does not provide sufficient potential to reduce operational costs (e.g. less spares provisioning) without compromising customer requirements. Also the reduced MID cannot be seen as a savings potential, as the total number of people needed per operational site is defined through the number of people per maintenance action and the number of specialists per operating system. These people have to be paid, even if they have less work to do. The reduced variance is only an indicator that the risk for incorrect planning of maintenance


231


8

resources is reduced. The more accurate PvM measures are expected to enable further improvement potential.

In contrast to the results for EHM without PdM it can be seen that the implementation of prognostics can help to reduce the overall risk for fulfilling service objectives. Simultaneously a reduction of the unscheduled events enables operation with less spares and the potential for a further simplification in the logistic concept with a reduced risk to compromise customer requirements. Therefore it can be concluded that the integration of an EHM system should aim for enhanced health monitoring and predictive capabilities, otherwise the return on investment for the integration of EHM cannot be guaranteed.

However, also for the EHM without prognosis it is possible to show the improvement potential and to use the proposed framework to derive requirements for the development of EHM functions. All resulting EHM requirements for diagnosis and prognosis are mainly quantified through the failure modes that can be monitored or predicted plus the accuracy and robustness of the respective algorithms.

3.4. Cost Benefit Analysis

This section should give an introduction of how a Cost-Benefit-Analysis can be carried out by utilizing the proposed framework. We will focus on a Performance-Based-Contract [PBC] scenario, where the system provider has to pay penalties if the operator cannot obtain the service aims (e.g. availability). A full blown Cost-Benefit-Analysis approach should be to find the global minimum of a function that takes the following cost elements into account:

i) CBM design and procurement costs; ii) PBC penalties and rewards; iii) Logistic cost elements; iv) Spares and resources management cost elements.

By utilizing the framework a distribution function for each performance indicator can be derived. The parameter of interest for availability contracting would be A0. By assuming reasonable cost functions for contractual penalties and operation and support cost (OSC) savings due to reduced spares provisioning by varying the fill rate, a minimum of the resulting cost function can be found.

An example plot for this scenario, assuming a contracted availability of 80% and deriving the delta costs by means of cost indexing, is shown in Figure 11. The allocation of the minimum resulting costs is determined by all design and support parameters. The risk to achieve this cost value can be quantified through the variance of each single parameter. By adding more cost functions to estimate the resulting operation and support costs, it is possible to find the optimum solution for an EHM design concept. The LCC simulation can either be used to identify an optimal EHM concept or to derive acceptable design cost values to satisfy a business case for a given operational scenario.

Figure 11. Cost functions for availability contracting

4. USE CASE FOR INTEGRATED SIMULATION CONCEPT

In this section a case-study related to a generic hydraulic pump module will be presented: the aim is to further understand the concepts so far explained and to quantitatively show the improvements in the design phase that can result by utilizing the approach here illustrated.

After a brief introduction regarding the pump system and its main sub-components, the interest will be focused on the bearings, as sub-component of the pump system. In fact, care has been spent on properly simulate meaningful bearings' conditions, namely the behaviour of a bearing in presence of a defect and the degradation of the bearing behaviour following a growth in defect's severity. Both nominal and faulty behaviours have been validated by means of experimental tests. The model has been therefore used to test new diagnostic and prognostic algorithms; in fact, faults can be implemented under different operating conditions rather than waiting for these to occur. A generic approach has been followed to verify and validate the model creation and to properly assess effectiveness and efficiency of algorithms for diagnosis and prognosis: this approach is illustrated, as a flow chart, in Figure 12.

Figure 12. Flow chart of the EHM designing phases


232


9

The bearing dynamic model has been thereafter integrated in a general simulation pump framework that has been designed on purpose. The framework allows one to simulate the behaviour of a generic pump - within its sub-components – together with different monitoring capabilities on the various components: this way the use of the framework as a valuable tool for requirements' verification will be demonstrated, as well as the capabilities of the framework itself of assessing variation in system's performances when varying monitoring concepts.

4.1. Hydraulic Pump System

The hydraulic pump object of our interest is a variable displacement, axial piston pump. The most important groups are the Drive Group, the Displacement Group and the Control Valve Group. The Drive Group is the functional hearth of the system since it contains the axial pistons in the cylinder block and the control plate. The basis of the pump is an assembly of precision machined, high strength steel parts for the rotational functional parts, mounted in an alloy case. The main shaft is supported in rolling elements bearings. Pump sealing is achieved using either O-Rings or a mechanical seal. In Figure 13, a scheme is shown displaying the main actors of the system under investigation: in particular, one can recognize the metrological solutions that will characterize the enhanced monitoring capabilities of the system, namely a system of bi-axial accelerometers (to measure two orthogonal accelerations along the plane on which every roller bearing lies) and an electric chip detector to evaluate the level of contaminant in the hydraulic circuit.

There is a large number of items within the pump that will result on a system failure. Some of the pump's failures are direct consequence of the part failures (for example shear of the shaft); some others are indirect, e.g. debris in the hydraulic circuit. In the final simulation that will be performed, the failure of four pump sub-components will be considered, namely: bearings, sealing, shaft and pistons.

Figure 13. Hydraulic Pump scheme – The sub-components that will be the actors of the simulation are highlighted

The dynamic model of the first sub-component (the roller bearings) will be briefly presented in the next section.

4.2. Dynamic Model of Roller Bearings

In a bearings system, the time-variant characteristics are the result of the orbital motion of the rolling elements, whilst the non-linearity arises from effects due to the Hertzian force-deformation relationship. The model here presented and utilized is based on the work carried out by Sawalhi and Randall (2008). The main fundamental components of a rolling bearing are: the inner race, the outer race, the cage and the rolling elements. Moreover, important geometrical parameters are: the number of rolling elements nb, the element diameter Db, the pitch diameter Dp and the contact angle α (see Figure 14). The non-linear forces between the different elements, the time-varying stiffness, the clearance between rolling elements and races have been implemented into the model. The bearing has been modeled as a five Degrees of Freedom (DoF) system: two orthogonal DoF belong to the inner race/rotor component (xi and yi), two DoF are related to the pedestal/outer race (xo and yo) and the last one (yr) has been added to match the usually high frequency bearing response (16 kHz with 5% damping). Mass and stiffness of the outer race/pedestal on the other hand have been adjusted to match a low natural frequency of the system. Finally, mass and inertia of rolling elements are ignored.

The non-linear and time-variant model has been further detailed regarding its capabilities in reproducing health and faulty behaviours. These refinements are related to: a) random fluctuation of inner and outer race profiles; b) forces generated as a consequence of the roller element impact with the resulting profiles roughness; c) Elasto-hydrodynamic lubrication; d) slippage; e) mass unbalances and f) presence of spalling in the outer and inner race-way.

Figure 14. Roller bearing geometry and physics modeling scheme


233


10

As illustrated in the flow-chart of Figure 12, the verification and validation approach follows a circular and continuous path among the conceptual model validation, the computerized model verification and the operational validation. The conceptual model validation refers to the problem of determining that concepts, theories and assumptions underlying the conceptual model are correct; whilst the model verification is defined as assuring that the computer programming and implementation of the conceptual model is correct. On the other hand, the operational validation is defined as determining that the model’s output behaviour has sufficient accuracy for the model’s intended purpose. In the case under investigation, the domain of the model’s intended applicability is wide, since both nominal and faulty behaviours have to be properly simulated. Moreover, the same approach has been followed to verify and validate algorithms for diagnostics and prognostics. In the end, if suitable diagnostic and prognostic concepts could be defined and successfully tested, it is possible to integrate the validated simulation modules into a general simulation framework in order to assess, evaluate and validate the performances of the system resulting from the integration of modules with EHM capabilities.

Figure 15. Envelope of the two signals used to detect the frequency-value of encoded impulsive transients

Several experimental tests have been conducted in order to validate the system. The iterative analysis of the experimental findings related to both nominal and faulty behaviors has allowed the continuous and better matching of the computerized model to reality (model validation). A challenge was the correct simulation of a defective bearing, the developing of tools to diagnose a defective behavior and the implementation of concepts for Remaining Useful Life [RUL] prediction.

Various kinds of defect have been simulated in real bearings, as – for example - spalls of different length and depth both in the inner and outer race. Common tools in the frequency domain can be used for the validation behaviour of baseline conditions; this is not generally true for faulty conditions. As a matter of fact, together with a simple monitoring of the quadratic mean of the acceleration, a data driven diagnostic approach has been implemented for the present study; experimental data have been used to train a neural network for defect detection and classification. The diagnostic approach has moreover been made more robust by the integration of a mathematical tool named Spectral Kurtosis (Antoni, 2004): this instrument gives the possibility to have an estimation of the band to be demodulated without the need of historical data. In Figure 15, a comparison is shown between the signals processing of the vertical acceleration measured on the pedestal of a faulty bearing and the analogous results gained by running a simulation of its computerized model: the Fourier transform magnitude of the squared filtered signals clearly shows the typical faulty frequencies of the bearing (given the bearing characteristics, a theoretical Ball Pass Frequency Outer race of 382.3 Hz was calculated) as spacing between harmonics both in the real (upper trend) and simulated (lower trend) results. In the end of the designing phase, a verified and validated dynamic model has been released. It has been therefore widely used to test new diagnostic and prognostic algorithms since the required diagnostic features can directly be derived from simulated signal pattern.

However, the development of suitable prognostic algorithms needs also to focus on the evaluation and prediction of trends or degradation paths. Hence it is necessary to further develop degradation models that can be used to simulate growing faults. The derivation of such models is not always straightforward, as the process of degradation is stochastic and does not always follow known parametric laws (Bechhoefer, 2008). Several model-based approaches have been adopted so far for failure prognosis (Orchard, 2007); among the various methodologies implemented, the most promising mathematical framework is the one based on Particle-Filtering. This approach allows handling nonlinear, non-Gaussian systems; it assumes: a) the definition of a set of fault indicators, for monitoring purposes, b) the availability of real-time process measurements and c) the existence of empirical knowledge to characterize both nominal and abnormal operating conditions.


234


11

Figure 16. Model-based development of prognostic algorithms

Therefore, by means of this approach the current state estimates are in real-time updated and the algorithm predicts the evolution in time of the fault-indicators, providing the pdf of the RUL. Following the same verification and validation approach, the prediction algorithm has been designed. In Figure 16, one can see (upper graph) the process of validating the algorithm by running different simulations assuming representative degradation paths; in the lower graph, an example plot for a model-based RUL estimation is displayed.

The verified and validated model (regarding both its physical behaviour and the diagnostics and prognostics algorithms) has been therefore integrated into a simulation module, which will mimic the behaviour of a complex system. The model will be presented in the next section.

4.3. Hydraulic Pump Simulation

The simulation will regard four sub-components of the hydraulic pump, namely the sealing system (SEAL), the shaft (SHAF), the roller bearings (BEAR) and the piston-group (PIST). FMECA documents have been looked up in order to set realistic ratios between the values for the failure rates. Aim of the current simulation is to show and demonstrate how the developed framework can be usefully and effectively utilized in order to verify the fulfilments of the top-levels requirements.

Bearings models characterized by the enhanced diagnostic and prognostic capabilities just discussed have been integrated into the simulation framework; the system has been virtually equipped with accelerometers (see Figure 13) so that the health-state of the bearings system can be continuously checked. Then, as soon as a deviation from the baseline state is detected by the diagnostic algorithms, prognostic tools will process the acquired data and communicate the central processing and control unit estimated RULs and confidence levels. This will then affect the performances of the overall system and the framework

so far discussed will be therefore utilized to quantitatively assess the performances' variations by using the indexes already discussed in the previous sections. In other words, the primary results of the current simulation will be the failure rate distributions of the system; these will be fed to the virtual framework to derive the performance indexes and hence values directly related to customer satisfaction.

To handle a more realistic and complex scenario, the hydraulic system has been further virtually instrumented with an Electric Chip Detector (ECD – see Figure 13). This sensor measures in real time the amount of debris and contamination of the hydraulic liquid; this way, a preventive maintenance approach can be implemented for the piston group and the bearing system in the CBM branch. The bearing diagnostic algorithm can in fact be also used as fault confirmation for preventive actions on the pistons group.

Finally, the other components are considered to be classically monitored by means of an "On Condition Maintenance" approach, which results in corrective and preventive maintenance.

Hence, according to the on-line monitoring capabilities just introduced, the simplified simulation scheme in Figure 17 can be shown: it defines the primary failures that will belong to the OCM branch and the ones that will belong to the CBM branch.

In the following Figure 18, a diagram explaining the flow of the information in the verification procedure just presented is displayed. At the bottom of the graph lies the hydraulic pump model with its integrated enhanced monitoring concepts related to the bearing system. By assuming failure rate distributions for the different components, the simulation will randomly generate events; these will be treated accordingly to the system specifications and so the probability classes already shown in Figure 17 will be populated.

Figure 17. Maintenance approach hydraulic pump system


235


12

Figure 18. Verification process of a hydraulic pump module

Therefore, the statistical parameters (mean and variance) of each failure mode can be calculated and, by using the virtual framework, they can also be easily propagated in order to get the distributions of the performance indexes: availability, maintenance index and inverse logistics maintenance ratio.

Verification of the EHM design requirements can be carried out by comparing the results of the validation phase with the distributions from the verification phase. The resulting error in the system performance parameters can be used to assess whether the design goals are met or not. Based on this assessment it can be decided whether the EHM concept needs to be revised or can be implemented. The results for the selected use case are shown in Figure 19.

The shown use case simplifies the system architecture to a single component. The same approach can be applied if the integration would cover multiple components and subsystems with individual failure modes.

Figure 19. Performance indexes simulation case study

5. CONCLUSIONS

The proposed framework can support the development of a CBM system by validating diagnostic and prognostic design requirements w.r.t. selected KPIs or customer requirements. Sensitivity studies revealed that a CBM system should aim for the integration of predictive capabilities, as the improvement potential for an online monitoring system without prognostics is limited to a reduced maintenance effort and minor improvements in availability or other performance parameters of the system.

The concept provides a simple but robust approach for trade-off studies during an early design stage. Further improvements of the framework will focus on the evaluation and integration of a generalized weibull correlation coefficient (Yacoub et al., 2005) to replace the assumption for linearity between primary and secondary effects. The next step for maturation will be to validate the concept with established simulation tools (e.g. Simlox) for spares and resource management.

The idea for an integration of cost estimations and optimizations has been discussed. Follow-up studies to derive cost functions with established LCC estimation tools (e.g. PRICE) will be carried out. The integration of authoritative cost functions to obtain a framework for a multidimensional optimization of costs related to EHM design parameters, PBC aspects as well as resources and logistics management will be the main scope for future activities.

The concept for model-based verification of top-level system requirements has been illustrated. This approach shall enable the evaluation and assessment of diagnostic and prognostic capabilities before the system enters service. The authors are convinced that the cost-efficient validation and verification of multiple monitoring and prediction functions composed to a complex system design can only be realized in a virtual environment. The proposed framework provides such an environment and will be further maturated to support the V&V process for the development of a CBM system.


236


13

NOMENCLATURE

Symbols

ε Prediction Error θ Prediction Accuracy λ Failure Rate σ Standard Deviation

Abbreviations

A0 Availability BIT Build-In Test cdf Cumulative Distribution Function CBM Condition Based Maintenance CM Corrective Maintenance DoF Degrees of Freedom DR Detection Rate ECD Electric Chip Detector EHM Enhanced Health Management EnHM Enhanced Health Monitoring FA False Alarm FAR False Alarm Rate FPC Failure Prognosis Capability FMECA Failure Mode Effects and Criticality Analysis HMC Health Monitoring Capability ILMR Inverse Logistics Maintenance Ratio KPI Key Performance Indicator LCC Life Cycle Costs LOG Logistics MMH Maintenance Man Hours MNT Maintenance MTTR Mean Time To Repair MID Maintenance Index MWT Mean Waiting Time MLDT Mean Logistics Delay Time NFF No Fault Found OCM On Condition Maintenance OH Operational Hours OSC Operation and Support Cost PA Prognostic Accuracy PBC Performance Based Contract PC Prognostic Coverage pdf Probability Density Function PdM Predictive Maintenance pfr Spares Fill Rate PvM Preventive Maintenance RTFM Run To Failure Maintenance RMT Reliability, Maintainability and Testability RUL Remaining Useful Life SCR Service Capability Rate SFLT Secondary Faults SMNT Secondary Maintenance V&V Validation & Verification

REFERENCES

Antoni, J., (2004). The spectral kurtosis on nonstationary signals: Formalisation, some properties, and application. Proceedings of XII. European Signal Processing Conference, EUSIPCO, pp. 1167-1170, September 6-10, Vienna, Austria

Bechhoefer, E., (2008). A method for generalized prognostics of a component using Paris law. Proceedings of American Helicopter Society 64th Annual Forum, April 29 - May 1, Montreal, CA

Blumenfeld, D. (2001). Operations Research Calculations Handbook. CRC Press, p. 7

Byer, B., Hess, A. & Fila, L. (2001). Writing a convincing cost benefit analysis to substantiate autonomic logistics. Aerospace Conference

Dunsdon, J., Harrington, M. (2009). The application of open system architecture for condition based maintenance to complete IVHM. Aerospace Conference

Elandt-Johnson, R.C. & Johnson, N.L. (1980). Survival Models and Data Analysis. John Wiley and Sons, New York, p.69

Endo, H. (2005). A study of gear faults by simulation, and the development of differential diagnostic techniques. Ph.D. Dissertation, UNSW, Sydney

Orchard, M. E., (2007). A particle filtering-based framework for on-line fault diagnosis and failure prognosis. Doctoral dissertation. Georgia Institute of Technology, Atlanta, GA, USA.

Reimann, J., Kacprzynski, G., Cabral, D. & Marini, R. (2009). Using condition based maintenance to improve the profitability of performance based logistic contracts. Annual Conference of the Prognostics and Health Management Society

Sawalhi, N., Randall, R.B. (2008). Simulation gear and bearing interactions in the presence of faults. Part I: The combined gear bearing dynamic model and the simulation of localised faults. Mechanical Systems and Signal Processing, vol. 22, pp. 1924-1951

Saxena, A., Roychoudhury, I., Celaya, J.R., Saha, S., Saha, B. & Goebel, K. (2010). Requirements Specifications for Prognostics: An Overview. American Institute of Aeronautics and Astronautics

Spare, J.H. (2001). Building the business case for condition-based maintenance. Transmission and Distribution Conference and Exposition

Stuart, A & Ord, K. (1998). Kendall’s Advanced Theory of Statistics. Arnold. London, 6th edition, p.351

Yacoub, M.D., Benevides da Costa, D., Dias U.S. & G.Fraidenraich (2005). Joint Statistics for Two Correlated Weibull Variates. IEEE Antennas and Wireless Propagation Letters, Vol. 4


237


14

BIOGRAPHIES

Heiko Mikat was born in Berlin, Germany, in 1979. He received his M.S. degree in aeronautical engineering from the Technical University of Berlin, Germany, in 2008. From 2006 he worked as trainee and later on as Systems Engineer at Rolls-Royce Deutschland, Berlin, Germany, designing and testing engine fuel system concepts and control laws. Since 2009 he works as Systems Engineer at the CASSIDIAN Supply Systems Department and is responsible for the development of new health management technologies for aircraft systems. His current research activities are mainly focussing on the maturation of failure detection and prediction capabilities for electrical, mechanical and hydraulic aircraft equipment.

Antonino M. Siddiolo was born in Agrigento, Italy, in 1976. He received his M.S. and Ph.D. degrees in mechanical engineering from the University of Palermo, Italy, in 2000 and 2006, respectively. From 2004 to 2005 he was a Visiting Scholar at the Centre for Imaging Research and Advanced Materials Characterization, Department of Physics, University of Windsor, Ontario (Canada). Then, he worked as a researcher and Professor at the University of Palermo and as a Mechatronic Engineer for Sintesi SpA, Modugno (Bari), Italy. Currently, he works as Systems Engineer at the CASSIDIAN Supply Systems Department, supporting the Integrated System Health Monitoring (ISHM) project. His research activities and publications mainly concern non-contact optical three-dimensional measurements of objects and non-destructive ultrasonic evaluation of art-works. His main contributions are in the field of signal processing to decode fringe patterns and enhance the contrast of air-coupled ultrasonic images.

Matthias Buderath - Aeronautical Engineer with more than 25 years of experience in structural design, system engineering and product- and service support. Main expertise and competence is related to system integrity management, service solution architecture and integrated system health monitoring and management. Today he is head of technology development in CASSIDIAN. He is member of international Working Groups covering Through Life Cycle Management, Integrated System Health Management and Structural Health Management. He has published more than 50 papers in the field of Structural Health Management, Integrated Health Monitoring and Management, Structural Integrity Programme Management and Maintenance and Fleet Information Management Systems.


238

Poster Papers

Analyzing Imbalance in a 24 MW Steam Turbine

Afshin DaghighiAsli1, Vahid Rezaie

2, and Leila Hayati

2

1Morvarid Petrochemical Complex

[email protected]

2Mopasco Consulting Company

ABSTRACT

Imbalance in critical rotary equipment is one of the most

important factors, which should be controlled to prevent

great damages. In this case study we are discussing about a

24 MW steam turbine, which drives a propane compressor.

The radial vibration on the DE side of the turbine was

growing gradually to a high level close to the alarm's value.

Using FFTs, time signals, orbit diagrams, and phase

measurement led us to believe that the rotor became

imbalanced. After tripping and disassembling the turbine,

we found out, some blades of the impulse stage of HP

section got broken. Changing the rotor with the spare one,

and repair the damaged rotor, worked out. It was concluded

that using the vibration analysis technique is an effective

method to find critical rotating equipment’s faults at the

earliest levels. And performing the essential correcting tasks

to prevent secondary damages and specially decrease of

production.

1. INTRODUCTION

Vibration analysis technique is an effective method to find

critical rotating equipment's faults such as imbalance, which

is one of the most common defects of machinery that can be

so destructive. Trending online values and gathering FFTs;

phase measurements; time waveforms and orbit diagrams

can help us to determine the faults to prevent secondary

damages and specially production decrease at the earliest

level even if the machine is very big or sophisticated.

2. IMBALANCE

Condition that exists in a rotor when vibration force or

motion imparted to its bearings as result of centrifugal

forces(1)

. Vibration due to unbalance of a rotor is probably

the most common machinery defect. It is luckily also very

easy to detect and rectify. It may also be defined as the

uneven distribution of mass about a rotor’s rotating

centerline. There are two new terminologies used; one is

rotating centerline and the other is geometric centerline. The

rotating centerline is defined as the axis about which the

rotor would rotate if not constrained by its bearings (also

called the principle inertia axis or PIA). The geometric

centerline (GCL) is the physical centerline of the rotor.

When the two centerlines are coincident, then the rotor will

be in a state of balance. When they are apart, the rotor will

be unbalanced. There are three types of unbalance that can

be encountered on machines, and these are:

1. Static unbalance (PIA and GCL are parallel)

2. Couple unbalance (PIA and GCL intersect in the

center)

3. Dynamic unbalance (PIA and GCL do not touch or

coincide)(2)

3. DESCRIPTION OF THE PROBLEM

The turbine that mentioned is driver of the refrigerant

compressor (propane) which is for Morvarid petrochemical

complex-the 5th olefin –in Iran that feeds MehrPC (HDPE)

plant. This turbine actually plays the Heart role for the plant.

3.1. Technical Information(3)

Model: Siemens SST-600 Shaft Diameter at bearing DE:

250 mm

Power: 24 MW Shafter Diameter at bearing

NDE: 200 mm

Min speed: 3530 rpm Inlet steam pressure: 40 bar

Rated speed: 4633 rpm Inlet steam temperature:

392°C

Trip speed: 5096 rpm Outlet steam pressure: -0.8

bar

First critical speed: 2890 rpm Admission pressure: 5 bar

Second critical speed: 7566 rpm Admission Temperature

180°C

Bearing DE: RKS05-5* 50-

BETA=.5L.B.P.

Vibration alarm’s value: 150

μm

Bearing NDE: RKS-08-4* 60-

BETA=.5L.B.P.

Vibration Trip’s value: 194

μm

_____________________

Afshin DaghighiAsli et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License,




240


2

Figure 1. Turbo Compressor

In 14 Jan 2011, radial vibration values on turbine DE

bearing increased a little. Then data acquisition from online

monitoring system began to gather FFTs for precise

analyzing. Visit the bearing in early August 2011 assure us

that it could not be the source of high vibration level.

Figure 2. Not severe pitting (permissible clearance)

Gathering FFTs; time waveforms; orbits and phase values,

led us to believe that, rotor might have imbalance or bent

shaft defect. Meantime we discovered that X507 was the

most important and variable value.

Table 1. Vibration level DE bearing

Date X507

Date Y507

02-Apr-11 8 02-Apr-11 8

03-Apr-11 15 03-Apr-11 16

02-May-11 7.5 02-May-11 8

03-May-11 22 03-May-11 13

12-Aug-11 28 12-Aug-11 18

13-Aug-11 45 13-Aug-11 53

26-Sep-11 43 26-Sep-11 54

27-Sep-11 61 27-Sep-11 54

12-Oct-11 71.5 12-Oct-11 52

30-Oct-11 91 30-Oct-11 76

21-Nov-11 105 21-Nov-11 82

Figure 3.Vibration trend DE bearing

Figure 4. Vibration X507

Figure 5. X507 FFT (Obviously 1*rpm excited)

Comp.

Values X506 Y506 X505 Y505

VIB 3.9 6.1 17 15.5

PHASE 317 68 280 11


241


3

Figure 6. X507 Time Waveform (Absolute Sinusoidal)

Figure 7. X507 Orbit diagram (Absolute Elliptical)

Table 1. Vibration & Phase Values in rated speed

Turbine Values X507 Y507 X508 Y508

VIB 81.8 58.5 43.5 10.5

PHASE 357 113 302 24

3.2. Axial Phase Measurement

In order to identify the fault accurately; we should measure

the phase difference between axial direction of DE and NDE

bearing. Whereas we have DC process values with no raw

signal in axial direction instead of axial vibration output in

online monitoring system. Then we decided to measure

synchronic time waveforms on the two bearings' housing as

following:

Figure 8. Turbine NDE Synchronous Time waveforms

Figure 9. Turbine DE Synchronous Time waveforms

(180° Phase difference)

As for 180-degree difference in axial measurement

directions; the axial phase difference between turbine DE

and NDE bearings is zero. That means the bent shaft theory

is rejected. Accordingly imbalance was the final diagnosis.

Eventually when vibration level on X507 was about 122

μm, turbine was tripped and disassembled so we saw that

some blades of the impulse stage for HP section was

broken.

Figure 10. Turbine Disassembly

Figure 11. Hp Broken Blades (Full shot)


242


4

Figure 12. Hp broken blades (Detail shot)

By changing the rotor, the vibration levels lowered to near

its initial values. And the damaged rotor was sent to shop

for repairing.

4. CONCLUSION

However imbalance fault is a common defect, but we should

consider that it can happen for any machine of any brand,

any type at any time. Meantime we should be aware of this

point, trend diagram is a key to recognize and solve

problems, and we should not confine to alarm values in

standard and manufacturer's recommendation. Finally it is

necessary to mention that, axial phase measurement is a

very important tool to understand the difference between

imbalance and bent shaft.

Acknowledgements

I want to thank my wife Leila, my parents, and my brother.

And I want to give thanks to Mr. Sohrab Yazdani and

gratitude Mr. Vahid Rezaie and Mr. Mohsen Rezaie

(Parspc).

Bibliography

International Standard, ISO-1940-1, Second

Edition,Published in Switzerland, 2003- 08-15, Page

2.

C Scheffer and P Girdhar, Practical Machinery Vibration

Analysis & Predictive Maintenance, 2004, Newnes

Publications, Netherlands, Pages 90-92.

Siemens AG Power Generation, Siemens Vendor Data

Book, Siemens Publications, 2006, Duisburg.


243

Economic reasoning for Asset Health Management Systems in volatile markets

Katja Gutsche1

1Hochschule Ruhr West, Mülheim a.d.R., 45473, Germany [email protected]

ABSTRACT

With respect to the growing demand in asset reliability, availability, maintainability, safety and productivity (RAMS-LCC) diagnosis and prognostic asset health management (PHM) systems provide more detailed asset health information which allows improved maintenance decision-making. This gives the opportunity for a more efficient, safer system operation (e.g. aircraft, production facilities) and therefore a more competitive enterprise. Of course, the implementation and use of PHM causes recurring and non-recurring costs, which have to be at least covered by the savings due to benefits achieved by cost avoidance through better asset health knowledge. The economic justification is essential for a positive decision upon the installation of PHM. This becomes more complex as the benefits depend on the operation circumstances which then are strongly influenced by the market situation. The market situation is strongly determined by the market demand, number of competitors and speed of technological changes. As these parameters are especially relevant in the producing industry, this shall be the system of choice in this paper. The question to be raised is how much the economical attractiveness of PHM systems correlates with an increase in market impermanence as to be seen globally in most market segments.

1. INTRODUCTION

Asset health plays a tremendous role for the production efficiency as well as system safety and therefore for the competiveness of especially asset intensive enterprises. Asset intensive enterprises are characterized by a higher number of industrial facilities needed for the production process which in addition are generally cost-intensive in investment. This becomes even more important in a global economy where profit margins decrease and customer satisfaction has to be constantly on a high level. In addition, there are technical changes as there is an

Increase in automation,

Increase of system and asset chaining,

Increase of asset complexity,

Increase in availability requests.

As a consequence, the relevance of health management systems is further increasing.

Their economic benefits have been outlined in several publications as e.g. (Banks & Reichard & Crow & Nickell, 2005), (Banks & Merenich, 2007), (Feldmann & Sandborn & Taoufik, 2008), (Al-Najjar, 2010). (MacConnell, 2007) lists the following as the major benefits:

1. Maintenance time savings,

2. Failure reduction,

3. False alarm avoidance,

4. Availability improvement – increase mean time between maintenance actions,

5. Spare and supply savings. There is no doubt that in sum prognostic and health management (PHM) decreases the efficiency loss caused by maintenance management driven by time or organizational restrictions rather than the use of detailed asset health knowledge, mostly expressed using wear-out stock. Wear-out stock (compare DIN 13306) defines the health of an asset. It indicates the degradation speed to the point where it can no longer operate in a safe and proper way. The wear-out stock (WS) is assumed to be high at the beginning (Time t = 0) of system use (WS0) and decreases due its use. Unless maintenance actions are undertaken the WS decreases to a critical value (WSmin) where the asset can no longer be maintained and has to be replaced in order to work properly again. If the maintenance management is done purely on a time base with no regard to the current system degradation status, the value creation through the productive system gets diminished. Figure 1 shows the reason for efficiency loss in traditional time-based maintenance This is an open-access article distributed under the terms of the Creative

Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


244


2

management. Maintenance actions are undertaken far before the limit wear-out stock (WSmin) is reached because of no clear asset health data. The asset lifetime becomes reduced. In sum the premature undertaking of maintenance reduces

the potential time of use (T), the potential output (e.g. production units)

and increases the number of maintenance activities.

Figure 1. Maintenance efficiency loss

Besides the optimization of preventive maintenance tasks the use of PHM also improves the failure time line (figure 2). This is because either there is a pure prevention of failure or there is a reduction of downtime because of detailed asset health information. Firstly with the asset health information the time till maintenance work starts gets reduced due to faster fault identification. Secondly the information from PHM systems decrease the time needed to actually refit.

Figure 2. Failure time line

Apart from these numerous positive effects of PHM, there are also challenges which have to be managed, as there are e.g.:

A large amount of special data is generated by the PHM.

Mostly selection and interpretation of most relevant data is not done by the system.

Decision making becomes more complex for the maintenance person in charge.

These challenges are listed at this point but will not be further analyzed at in this paper.

2. MOTIVATION

The economic attractiveness of PHM systems depends on the result of a cost-benefit analysis. This depends on the difference between the cost savings and the additional costs due to their implementation and use (section 1).

PHM has a notable effect on the asset availability which can be measured through (Wheeler & Kurtoglu & Scott, 2010) (Al-Najjar, 2010) (figure 3):

Reduction in (unplanned) stoppages,

Increase in mean time between maintenance actions,

Reduction of labor mean-time-to-detect,

Reduction of repair times,

Reduction of maintenance induced failures

and has therefore a positive effect on the direct and indirect maintenance costs which are mainly dependent on the maintenance time parameters as well as needed number of spare parts, cases of secondary damage and work accidents.

Figure 3 demonstrates the potential effect of the implementation of PHM systems (scenario 1) compared to their non-use (scenario 0) on the asset availability level (increasing) and the maintenance costs (decreasing).

Figure 3. Effect of PHM implementation

Seen from a life-cycle perspective PHM causes development, implementation, operation and maintenance expenses. Moreover prognostics may also cause false alarm but this shall not be looked at in this paper.

Table 1 lists the major potential costs and benefits of a PHM system application. Especially in the beginning investments


245


3

have to be made before actually using the system for asset health monitoring. The investment expenses are determined by the software and hardware components, the installation and testing complexity as well as the needed staff training. During the period of PHM system use there are cost positions due to the data management and its maintenance. The potential benefits have been outlined in detail in the sections before and shall only be listed at this point.

Table 1. Costs and Benefits of PHM (*value dependent on operation circumstances)

Their actual value is variable due to probabilistic behavior of assets and their failure regime, the technical characteristics of the PHM (self-learning etc.) and their usability. Apart from these uncertainties which have to be taken into account when deciding on PHM, the overall result of the implementation of PHM depends on the operation intensity:

How tight is the operation schedule for the asset to be monitored with regard to the customer needs?

The relevant operation circumstances in producing industries can be expressed in

Available realization time (e.g. time until product delivery),

Number of waiting jobs,

Number of shifts/ operation intensity.

These parameters change more often due to more market volatility. Market volatility is defined as the magnitude of short-term fluctuation in a time line compared to its mean value or a defined trend curve. Figure 4 shows the development of the German Gross Domestic Product (GDP) adjusted for prices between 1951 and 2008. It illustrates that the economic cycles became shortened; hence the markets are more volatile. This has major effects on the manufacturing industry and in

consequence on the operation circumstances and finally on the cost-benefit result of the use of PHM systems.

Figure 4. Market volatility (Statistisches Bundesamt, 2009)

3. ECONOMIC REASONING IN VOLATILE MARKETS

Whereas the costs listed in table 1 stay relatively stable no matter how the operation circumstances change (data management expenses increase due to more data volume), the value of the potential benefits increases with a decrease of available realization time and an increase in waiting jobs and operation intensity.

3.1. Value of availability

The value of a gain in availability changes depending on the operation circumstances. This value correlates with the failure costs. Failure costs are

Costs of decreased output before and after downtime,

Costs due to the downtime period (downtime costs) (see figure 2),

Opportunity cost,

Loss in asset value.

(Biedermann, 2008) outlines that the failure costs correlate with the percentage of downtime of overall asset lifetime and the level of use of the producing asset capacity (figure 5). In case of a constant percentage of downtime the failure costs decrease when the use of asset capacity use decreases. Illustrated with an example:

A manufacturing plant either works a) 24 hours/day (100% use of asset capacity) or b) 18 hours/day (75% use of asset capacity). The output per hour is 1unit worth 500 €. In case of a failure lasting one working day (downtime) the loss in production (failure costs) is in a) 24 * 500€=12.000€ and in b) 18 * 500€=9.000€.

The level of use of the asset capacity is one parameter describing the operation circumstances. As the level of

Costs Benefits

Software Reduction in failure rate*

Hardware Reduction in downtime

Training Decrease in quality rejections*

Installation & testing Reduction in spare parts*

Data management* Reduction in accident compensations*

PHM system maintenance & updates

Decrease in lifetime loss


246


4

capacity use depends on the operation intensity which is depending on the market demand (high demand – high use level), the cause-effect chain can be summed up in the following way:

market situation ↓ use of asset capacity ↓ failure costs ↓ value of availability↓

Figure 5. Value of non-availability in producing industries (failure costs) (Biedermann, 2008)

3.2. Volatility gap in availability savings

With a change in market there is a positive or negative effect on the manufacturing industry. The change in product demand directly influences the manufacturing asset. Depending on the positive or negative change in the market, the asset work load increases due to a higher product demand and decreases when there is a decline in market demand. These scenarios are outlined in figure 6, upper part. During an economic upturn the asset is used to its maximum. The asset work load is adjusted when there is less demand for the product or service. Corresponding to the development in asset work load there is a change in availability savings (SA) (figure 6, lower part). If the asset is always used to assumed high level and there is no change in market demand the value of savings through availability increase due to the use of PHM systems (SAnv) is higher than when there are changes in market parameters, expressed by a higher volatility (SAv).

Comparing these two scenarios a so-called volatility gap in savings through the use of PHM systems evolves.

As the saving in availability is directly linked to the benefits of PHM systems, the cause-effect chain in section 3.1. can be extended in the following manner:

market situation ↓ use of asset capacity ↓ failure costs ↓ value of availability↓ benefit of PHM systems ↓

Figure 6. Effect of volatility on savings through availability increase SA

3.3. Numerical example

To highlight the importance of market effects on the economic attractiveness of PHM systems a numerical example will be outlined.

The following assumptions shall be made:

Table 2. Numerical example - assumptions

The volatility gap shall be shown by comparing the following two scenarios

Scenario A: constant operating hours of two shifts of 8 hours on 365 days per year = 5840 h/year = maximum use of asset capacity

Scenario B: Changing operating hours (see table 4, column 2).

Use period [years] 10

fault time per year [% of

operating hours] 1

value of downtime [€/per

hour] 150

fault prevention rate through

PHM system [%] 20


247


5

Table 3 and table 4 show the potential availability savings through the use of PHM systems. As in scenario B the asset is not used to its full extend the sum of availability savings is lower than in scenario A (13.578 € < 17.520 €). The difference of 3942€ represents the volatility gap indicated in figure 6.

Table 3. Scenario A – maximum use of capacity, no volatility

Table 4. Scenario B - changing use of capacity and market volatility

4. SUMMARY

The integration of a health management system is primarily based on the economic reasoning. PHM provides failure predictions, reduces the downtime, expands the maintenance intervals and therefore decreases the efficiency loss in maintenance and increases the system availability. However, PHM causes investment expenses and recurring costs for the PHM system sustainment. Whereas the latter are mostly independent of the market situation in which the operator uses the asset to fulfill customer demands, the potential benefits strongly depend on the operation

circumstances (e.g. working shifts, time buffers within the production line, stock of semi-finished products).

As there is not only a higher level of competition within the markets but also more volatility (e.g. steel production) which strongly influences the operation circumstances, these dynamic effects have to be taken into account when deciding on the introduction of a PHM system.

This paper outlines the effect of market volatility on the economic reasoning of the use of PHM systems. Depending on the market situation the volatility gap describes the cost avoidance due to higher system availability. The value of cost avoidance then depends on the level of use of asset capacity.

In volatile markets modular PHM systems may be an option. These systems allow a downsizing. Instead of installing the all-embracing PHM system, modular systems offer the big advantage of being sizeable according to the actual operation constraints (e.g. number of sensors and interpreting algorithms). This allows a downsizing of recurring costs for the health management system and makes them more flexible with respect to the increase in market volatility.

REFERENCES

Al-Najjar, B. (2010), Strategies for Maintenance Cost-effectiveness. In Holmberg et al. eMaintenance, pp.297-344

Banks, J. & Merenich, J. (2007). Cost Benefit Analysis for Asset Health Management Technology, IEEE Annual Reliability and Maintainability Symposium, pp. 95-100

Banks, J., Reichard, E., Crow, E., Nickell, E. (2005), How Engineers Can Conduct Cost-Benefit Analysis for PHM Systems, IEEE Aerospace Conference, pp. 3958-3967

Biedermann, H. (2008), Anlagenmanagement – Managementinstrumente zur Wertsteigerung, TÜV-Verlag

DIN EN 13306 (2010), Maintenance – Maintenance terminology

Feldmann, K., Sandborn, P., Taoufik, J. (2008), The Analysis of Return on Investment for PHM Applied to Electronic Systems, Proceedings of the International Conference on Prognostics and Health Management, October, Denver, CO

MacConnell, J.H. (2007), ISHM & Design: A review of the benefits of the ideal ISHM system, IEEE Aerospace Conference

Statistisches Bundesamt (2009), https://www-genesis.destatis.de/

Wheeler, K., Kurtoglu, T., Poll, S. (2010), A Survey of Health Management User Objectives in Aerospace Systems Related to Diagnostic an Prognostic Metrics,

Year

operating

hours

fault periods per

year [h]

fault reduction

through phm [%]

Availability

savings [€]

1 5840 58,4 11,68 1752

2 5840 58,4 11,68 1752

3 5840 58,4 11,68 1752

4 5840 58,4 11,68 1752

5 5840 58,4 11,68 1752

6 5840 58,4 11,68 1752

7 5840 58,4 11,68 1752

8 5840 58,4 11,68 1752

9 5840 58,4 11,68 1752

10 5840 58,4 11,68 1752

17520

sceanrio A

Year

operating

hours

fault periods per

year [h]

fault reduction

through phm [%]

Availability

savings [€]

1 5840 58,4 11,68 1752

2 5548 55,48 11,096 1664,4

3 5256 52,56 10,512 1576,8

4 4964 49,64 9,928 1489,2

5 4672 46,72 9,344 1401,6

6 4380 43,8 8,76 1314

7 4088 40,88 8,176 1226,4

8 3796 37,96 7,592 1138,8

9 3504 35,04 7,008 1051,2

10 3212 32,12 6,424 963,6

13578

sceanrio B


248


6

International Journal of Prognostics and Health Management


249

System PHM Algorithm Maturation

Jean-Remi Massé1, Ouadie Hmad

3 and Xavier Boulet

2

1,2Safran Snecma, Moissy Cramayel, 77550, France

[email protected]

[email protected]

3SafranEngineering Services, Moissy Cramayel, 77550, France

[email protected]

ABSTRACT

The maturation of PHM functions is focused on two Key

Performance Indicators (KPI): The NFF, No Fault Found

ratio, P(No degradation|Detection), and the Probability Of

Detection POD, P(Detection|Degradation). The estimation

of the second KPI can be done by counting the global

abnormality threshold trespassing when each different kind

of degradation is simulated. The estimation of the first KPI

can be done through the following formula using the Bayes

rule:

)(

)(*)|()|(

DetectionP

ionNoDegradatPionNoDegradatDetectionPDetectionionNoDegradatP

P(Degradation) may be known through FMEA or field

experience. Typically, for a probability of 10-7

, a specified

NFF ratio of 1%, and an expected POD of 90%, the order of

magnitude of P(Detection| No degradation) should be 10-9

.

The estimation of such extreme level of probability needs

some parametric adjustment of the distribution of the global

abnormality score with no degradation. Two PHM functions

are considered as case studies: Turbofan engine start

capability (ESC) and turbofan engine lubrication oil

consumption (EOC). In ESC the global abnormality score is

a norm of a vector of specific abnormality scores. The

specific scores are centered and reduced residues between

expected values and observed values. Some specific scores

are devoted to starter air supply. Examples are duration of

phase 1 from starter air valve open command to ignition

speed. Other scores are devoted to fuel metering. Examples

are duration of phase 2 from ignition to cut off speed. The

expected values are estimations through regression relations

using as inputs the other specific scores and context

parameters such as lubrication oil temperature at start. The

regression relations are learnt on start records with no

degradations. Impact simulations of degradations on specific

scores are learnt on a phase 1 simulator based on torques

balance and on start test records including fuel metering

biases. In EOC, the global abnormality score is the daily

weekly or monthly consumption estimations on a daily

basis. Consumption estimations use linear regressions of oil

level measurements versus time at an invariable ground idle

speed corrected according to oil fill detections and oil

temperature. The over consumptions are simulated by drifts

in mean of the consumption estimations.

To reach acceptable POD at the specified NFF ratio three

improvements are needed for ESC:

Adjust the abnormality decision threshold

according to each candidate degradation using

extreme value quantiles on the global abnormality

score distribution

Average the global abnormality score on five

consecutive starts

Learn the regression relations specifically on each

engine.

The first improvement is a novelty. It is successfully applied

to both ESC and EOC functions. It is generic to all airborne

system PHM functions based on abnormality scores.

1. INTRODUCTION

For years, airborne system PHM maturity has been scaled in

reference to the popular “Technology readiness levels”

(Wikipedia, 2012). This is appropriate to control the

maturity of the function implementation. This does not

address the intrinsic maturity of the function independently

of the implementation.

Therefore, a maturation process of PHM functions is

followed. It uses six sigma concepts (Deming, W. E. 1966,

Forrest W. B. 2008). First, generic sub functions of system

PHM are considered. This is illustrated on two use cases.

Then, two Key Performance Indicators (KPI) are chosen

according to the considered sub functions and to airline

business models. The estimation of these KPI is defined.

_____________________

Jean-Remi Massé et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License,




250


2

To reach acceptable levels of KPI on the use cases, some

improvements of the functions are proposed.

2. CASE STUDIES

2.1. General

The first phase of the six sigma approach is the definition

phase. The function considered is the first item to define.

PHM functions are usually represented as OSA-CBM

architecture (MIMOSA, 1998). Table 1 typically represents

such an architectural architecture applied to a system PHM

(Lacaille J. 2010).

Table 1. Typical system PHM OSA-CBM summary

#1 DATA

ACQUISITION Acquire sensor and system data

#2 DATA

MANIPULATION

Extract the indicators

Acquire the context parameters

#3 STATE

DETECTION

Build the prediction model**

Score the prediction errors

#4 HEALTH

ASSESSMENT

Learn reference patterns (syndromes)**

Cluster according to references

Isolate the potentially degraded LRU(s)

or module(s) through Bayesian

calculation

Isolate the potentially degraded LRU(s)

or module(s) through fault isolation

manual on failure condition precursors

Score global abnormality*

Adjust the abnormality decision

thresholds**

Detect abnormality

#5 PROGNOSTIC

ASSESSMENT

Predict the probability of maximal

degradation before failure within a given

operational time

#6 ADVISORY

GENERATION

Establish a global diagnosis and

prognosis merging other health

monitoring means.

*Has a learning mode; **Is a learning mode

The specificities of a given PHM function are restricted to

level #1 Data acquisition and level #2 Data manipulation

through indicators and context parameters. The next levels

are, in general, common and may have à learning mode in

addition to the basic PHM mode. Such learning modes are

tagged in table 1 with an asterisk or two*.

The PHM function section considered for maturation is part

of level # 4 Health Assessment. It is tagged in table 1 with

bold characters:

Score global abnormality

Adjust the abnormality decision thresholds*

Detect abnormality.

The abnormality detection function considered here is based

on global abnormality score threshold trespassing.

On this general basis, two specific use cases are considered:

Engine start capability, ESC (Ausloos A., Grall E.,

Beauseroy P. Masse J.R., 2009. Mouton P.,

Ausloos A., Massé J -R., Aurousseau C. A.

Flandrois X.,2010, [8] Engine oil consumption, EOC (Demaison F.,

Massot G., Massé J –R., Flandrois X., Hmad O.,

Ricordeau J., 2010).

2.2. Engine Start Capability function

Engine start capability function, ESC, relies on a set of

indicators (Figure 1)

Extracted during start sequence

Sensitive to no start precursors.

Figure 1. Engine start capability, ESC, indicators

Some indicators are devoted to air supply degradations.

Examples are the duration of phase 1 of the start, from

starter air valve open command to ignition HP rotor speed,

or, the average acceleration of HP rotor during phase 1.

These indicators are sensitive to air starter valve slow

opening. Such degradation is a precursor of valve stuck

closed, which is a typical origin of no start.

Some indicators are devoted to fuel metering degradations.

Examples are phase 2 duration, from ignition to starter cut

speed, or, Exhaust Gas Temperature slope during phase 2.

Prediction error scores are centered and reduced residues

between expected values of indicators and observed values

of indicators.

The expected values of indicators are estimations, through

regression relations, using as inputs the other indicators and

context parameters such as lubrication oil temperature at

start. Referring to table 1, this is the basic PHM mode of

“#3 – State detection - Score the prediction errors”

The regression relations are learnt on start records with no

degradations. The means and standard deviations of the

residues needed for centering and reduction are learnt on the

same records. Referring to table 1, this is the learning mode

of “#3 – State detection - Build the prediction model”.


251


3

The global abnormality score, , is the squared

Mahalanobis norm of the vector, , of prediction error

scores:

(1)

Referring to table 1, this is the basic PHM mode of “#4

Health assessment – Score global abnormality”

The correlation matrix, , is also learnt on the same records

with no degradations. This is the learning mode of “#4

Health assessment – Score global abnormality”

2.3. Engine Oil Consumption function

Engine oil consumption function, EOC, relies on oil level

extractions at taxi phase. The oil levels are captured at

constant ground idle speed when the switch based level

indication changes. A small correction of level is done

according to temperature.

Figure 2. Engine oil consumption, EOC, oil level captures

The global abnormality score is the daily weekly or

monthly consumption estimation on a daily increment. This

relies on regressions on the oil levels versus flight time

taking into account the oil fills. Referring to table 1, this is

the basic PHM mode of “#4 Health assessment – Score

global abnormality”. Unlike ESC, for EOC, this item has no

learning mode.

3. P(NO DEGRADATION|DETECTION)

3.1. Definition

As seen previously, the PHM function section considered

for maturation in § 2.1 :

Score global abnormality*

Adjust the abnormality decision thresholds**

Decide of abnormality detection.

According to six sigma methodology, this needs to be

assessed and quantified. This is addressed through two Key

Performance Indicators Critical To Quality, Critical To

Business, KPI CTQ, CTB.

In commercial aeronautics, the major KPI CTQ CTB for

abnormality detection is an extension of the so called “No

Fault Found” ratio, NFF. The original NFF ratio refers to

failure detections which are false. The extended NFF ratio,

considered in PHM, refers to degradation detections which

are false. The degradations considered in PHM are failure

precursors. The NFF ratio is defined as P(No degradation|

Detection).

The line maintenance wishes to avoid “No fault founds”.

For instance, a false detection of fuel metering degradation

may lead to hydro mechanical unit replacement. This is

eight hours manpower. Therefore, NFF ratios should not

exceed 5% at line maintenance stage. High NFF ratios

would kill PHM.

3.2. Counterpart

A second KPI CTQ CTB is the well known Probability Of

Detection, POD. The POD is defined as P(Detection

|Degradation).

For line maintenance the POD should be as high as possible

under the constraint of low NFF ratio. For operations

management, the abnormality detection should occur as

soon as possible. For operations, NFF ratio is not as critical

as for line maintenance.

The popular Probability of False Alarm, PFA, P(Detection

|No degradation), is linked to the two KPI CTQ CTB by the

following relation:

(2)

With the type of decision considered, based on threshold

trespassing, P(Detection| No degradation) is the probability

of the global abnormality score with no degradation being

higher than the abnormality decision threshold (Figure 3).

Figure 3. Diagram of PFA and POD for a decision based on

threshold trespassing

For a typical P(Degradation) of 10-6

or 10-7

per decision, an

expected NFF rate, P(No degradation| Detection), of 5%


252


4

and a POD, P(Detection| Degradation) of 90%, the PFA,

P(Detection| No degradation), should be 5.10-8

or 5.10-9

(Formula 2).

3.3. Estimation

The estimation of POD, P(Detection |Degradation), can be

done by counting the global abnormality threshold

trespassing when each different kind of degradation is

simulated.

The degradations are simulated rather than observed. The

premise degradations typically occur with a probability of

10-6 or 10E-7 per engine flight. It would be necessary to

cumulate more than 27.105 or 27 million flights to observe

this event at least thirty times with a probability of 90 %.

The simulations are based on transformations of the

degradation indicators values with no degradation. Such

transformations are characterized by

The degradation considered

The degradation intensity.

Strong intensity corresponds to ultimate degradation level

just before failure. This concerns line maintenance. At this

level P( No degradation | Detection) should be less than

5%. Weak or mean intensity correspond to initiation of the

degradation. This concerns operations. At this level

P(Detection | Degradation) should be favored even though

P( No degradation | Detection) reaches up to 50%.

In ESC, simulations of degradations related to starter air

supply were learnt with a phase 1 simulator based on

torques balance. Simulations of degradations related to fuel

metering were learnt on start tests records including fuel

metering biases.

In EOC, The over consumptions are simulated by drifts in

mean of the consumption estimations.

The estimation of the NFF ratio may be done through the

following formula:

(3)

where

(4)

P(Degradation) may be known through FMEA or field

experience. P(No degradation) = 1-P(Degradation) is close

to 1.

As seen previously, the order of magnitude of P(Detection|

No degradation) should be typically 5.10-8

or 5.10-9

As seen previously, P(Detection| No degradation) is the

probability of the global abnormality score with no

degradation being higher than the decision threshold (Figure

3). The estimation of such extreme level of probability

needs some parametric adjustment of the distribution of the

global abnormality score with no degradation. This requires

modeling correctly the distribution tail of the global

abnormality score with no degradation. It appears that the

adjusted Gamma and Normal distributions do not fit well

the observed distribution of the global abnormality score.

Conversely, according to figure 3, the multi parametric

adjustment obtained with Parzen estimator fits well the

observed distribution (Hmad O., Masse J.-R., Grall E.,

Beauseroy P. Mathevet A., 2011, Silverman, B. W., 1991).

Figure 4. Observed and adjusted cumulative distribution

function of ESC global abnormality score with no

degradation

4. ABNORMALITY DECISION THRESHOLDS ADJUSTMENT

4.1. Methodology

The first improvement proposed to reach an acceptable level

of P( No degradation |Detection) is to adjust the

abnormality decision threshold on the global abnormality

score with no degradation. As seen previously, P(Detection|

No degradation) is the probability of the global abnormality

score with no degradation being higher than the decision

threshold. Conversely, if the expected value of P(Detection|

No degradation) is known, the adjustment of decision

threshold may take advantage of the accurate Parzen fit. As

a first guess of P(Detection| No degradation), formula 2

may be used with a prior assumption of P(Detection|

Degradation) being close to 100%. In a second iteration

with the prior threshold, a more realistic estimation may be

done for P(Detection| Degradation) (Hmad O., Massé J -R.,

Grall-Maes E., Beauseroy P., Boulet X., 2012).

4.2. Application to ESC

This methodology is applied to ESC. A global abnormality

score distribution is observed on starts with no degradations.


253


5

Figure 5. Impact of the fit quality on decision threshold

Figure 5 shows the need to check the distribution fits.

Figure 6 shows the initial performances of ESC with the

Parzen threshold adjustment.

Figure 6. Prior abnormality decision threshold and global

abnormality score distributions with three starter air valve

degradation intensities

Only 20% of the strong degradations are detected. This is

not acceptable for line maintenance. None of the weak or

mean degradations are detected. This is not acceptable for

operations. The performances are improved with a moving

average on the global abnormality score (Figure 6).

Figure 7. Improvement of the performances with global

abnormality score moving average on five consecutive

flights

The performances become acceptable for line maintenance

but still not for operations. The performances are again

improved with regression relations learnt specifically on

each engine (Figure 7). This improves the accuracy of the

indicator predictions.

Figure 8. Improvement of the performances with regression

relations learnt on each specific engine and moving average

of the global abnormality score

The performances become now acceptable for both line

maintenance and operations.

4.3. Application to EOC

The methodology of threshold adjustment is now applied to

Engine Oil Consumption.

Figure 9. Prior abnormality decision threshold and daily

consumption distributions with two over consumption levels

For this PHM function, almost all mean and strong over

consumptions are detected.

5. CONCLUSION

The PHM sub function considered is abnormality detection

based on threshold trespassing by a global abnormality

score. For such function the No fault found ratio, P(No

degradation| Detection) is relevant for line maintenance.

The estimation of this performance indicator supposes to fit

accurately the distribution of the global abnormality score

with no degradation.

To reach acceptable probabilities of detection at the

specified NFF ratio three improvements are needed for

Engine Start Capability PHM function:

Abnormality decision threshold adjusted using extreme

value quintiles on the global abnormality score

distribution

Moving average of the global abnormality score

Regression relations learnt specifically on each engine.

The first improvement is a novelty. It is successfully applied

to both use cases considered. It is generic to all airborne

system PHM functions based on abnormality scores. It is


254


6

now being extended to other abnormality decision functions

such as “k trespassings among n“ and Wald likelihood

ratio.

REFERENCES

Ausloos A., Grall E., Beauseroy P. Masse J.R.. (2009).

"Estimation of monitoring indicators using regression

methods - Application to turbofan starting phase".

ESREL conference

Demaison F., Massot G., Massé J –R., Flandrois X., Hmad

O., Ricordeau J (2010). "Méthode de suivi de la

consommation d’huile dans un système de lubrification

de turbomachine" Patent # 1H105790 1093FR

Deming, W. E. “Some Theory of Sampling” Dover

Publications 1966.

Forrest W. B. “Implementing Six Sigma Smarter Solutions

Using Statistical Methods”. Wiley – Interscience

Publications 2008.

Hmad O., Massé J -R., Grall-Maes E., Beauseroy P., Boulet

X. (2012) "Procédé de réglage de seuil de décision"

Patent

Hmad O., Masse J.R., Grall E., Beauseroy P., Mathevet A.

(2011). “A comparison of distribution estimators used

to determine a degradation decision threshold for very

low first order error”. ESREL conference, September

18-22, Troyes.

Lacaille J. (2010). "Identification de défaillances dans un

moteur d’aéronef". Patent # FR2939924 A1

Lacaille J. (2010). "Standardisation des données pour la

surveillance d’un moteur d’aéronef". Patent #

FR2939928 A1

MIMOSA. (1998). "Open Systems Architecture for

Condition-Based Maintenance", OSA-CBM v3.1

standard.

Mouton P., Ausloos A., Massé J -R., Aurousseau C. A.

Flandrois X. (2010). "Method for monitoring the health

status of devices that affect the starting capability of a

jet engine” Patent # WO 2010/092080 A1

Silverman, B. W. “Density Estimation for Statistics and

Data Analysis”. Chapman and Hall: London 1986.

Wikipedia (2012)

http://fr.wikipedia.org/wiki/Technology_Readiness_Le

vel

BIOGRAPHIES

Jean-Remi Masse (Paris, 1952) – Phd in statistics (1977),

Rennes University, has practiced and taught statistics in

several industries and universities. He is presently senior

expert in systems dependability engineering and PHM for

Safran Group.

Ouadie Hmad (Montereau Fault Yonne, 1986) is presently

a Phd student for Safran Engineering Services with the

Troyes University of Technology (UTT) on performance

assessment of PHM algorithms.

Xavier Boulet (Paris, 1981) – 5 years degree in System

Engineering (2005), Evry University, has led several Test

Bench development projects for Safran. He is presently the

project manager of system PHM algorithms development

for Safran Snecma


255

Design for Availability – Flexible System Evaluation with a Model Library of Generic RAMST Blocks

Dipl.-Ing. Dieter Fasol1 and Dr.-Ing. Burkhard Münker2

1LFK-Lenkflugkörpersysteme GmbH, Hagenauer Forst 27, 86529 Schrobenhausen, Germany [email protected]

2icomod – münker consulting, Olper Straße 53, 57258 Freudenberg, Germany [email protected]

ABSTRACT

Tailoring a complex system to meet given availability requirements is a challenging task in the design process. Besides compiling the applicable mtbf and mttr figures of the required components, the specific set of algebraic rules has to be identified and applied to compute the overall pre-dicted availability of the designated system functions for the individual architecture and for individual usage profiles.

While on the other hand model based methods have mean-while become established in the system development pro-cess, such time- and effort-saving simulation-based ap-proaches are by far not so common in the field of RAMST-analyses.

This contribution reports about an approach of amending a reusable library of functional component models - originally designed to explore the effect of assumed failures in com-plex networks by simulation - and applying it to compute the availability of a generic launcher system. Here the design engineer is faced with the complex task to find an architecture to guarantee a specified availability of the firing function with the given resource items onboard.

Developed within the tool environment RODON, the developed prototypical library enables a quick evaluation of structural design alternatives of the selected launcher application. On top it supports the full range of RAMST-analyses - like computation of cause-effect-relationships for FMEA, automatic drawing of FTAs for hazards of loosing designated system functions or systematic evaluation of the Diagnostic Coverage resp. potentially monitor or reconfiguration strat1egies - based on the same single model. Further generalization activities are going on.

1 Dieter Fasol, Burkhard Münker: This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Since the model is composed of generic qualitative building blocks, it took much less time to develop than full quantitative or physical model descriptions usually need. Although this qualitative representation implies certain limitations, as the authors experienced this is compensated by the advantage of an easier understanding by the modeler and quick adaption and to new principal architectures, while still providing sufficient system insight and result.

Driven by requirements from the industrial application, ideas for further library extension are being discussed to support also questions regarding repair procedures and time, resource allocation or cover even economical aspects.

The paper sections focus on the description of the launcher design task, a short introduction to the applied tool, the library development and application and finally the results and outlook.

BIOGRAPHIES

Dieter Fasol, born in Vienna May 3rd 1960 studied Mechanical Engineering at the Ruhr University of Bochum. He started his professional career at Messerschmitt-Bölkow-Blohm GmbH in Ottobrunn near Munich. His fields of work were designing Flight State Control and Guidance Algorithms for various guided missile systems as well as developing simulation environments to serve as a design and test bed for algorithm development. Since 2007 he is working in the field of safety engineering at LFK-Lenkflugkörpersysteme GmbH and started to apply the tool RODON for model based safety analysis.

Burkhard Münker, born Dec 21st 1965, studied Mechani-cal Engineering at the University of Siegen, Germany, where he also started his scientific career with a focus in simulation and fault analyses. Continuing at Technical University of Berlin, Germany, he developed a tool for automated generation of state space models and filters for early detection of hazardous situations in chemical reaction


256


2

systems and got his doctoral degree in engineering in 2001. For many years he has been working as senior consultant and project manager for the vendor of the model based reasoning tool RODON, developing and applying new ad-vanced approaches to the full range of diagnostic and RAMS activities for industrial and academic customers. Since 2010 working as independent technology consultant

and analyst he is still interested in tasks to support classical RAMST-analyses by advanced failure mode modeling tech-niques and esp. to adopt model based ideas to non-technical applications. He is also a lecturer for the topics Physical System Modeling and Model based Safety Assessment at University of Siegen. See his LinkedIn-profile for details.


257

Knowledge-Based System to Support Plug Load Management

Jonny Carlos da Silva1 and Scott Poll

2

1Mechanical Engineering Department,UFSC, Florianopolis, SC, 88040-900 Brazil

[email protected]

2NASA Ames Research Center, Moffett Field, CA, 94035 USA

[email protected]

ABSTRACT

Electrical plug loads comprise an increasingly larger share

of building energy consumption as improvements have been

made to Heating, Ventilation, and Air Conditioning

(HVAC) and lighting systems. It is anticipated that plug

loads will account for a significant portion of the energy

consumption of Sustainability Base, a recently constructed

high-performance office building at NASA Ames Research

Center. Consequently, monitoring plug loads will be critical

to achieve energy efficient operations. In this paper we

describe the development of a knowledge-based system to

analyze data collected from a plug load management system

that allows for metering and control of individual loads.

Since Sustainability Base was not yet occupied at the time

of this investigation, the study was conducted in another

building on the Ames campus to prototype the system. The

paper focuses on the knowledge engineering and

verification of a modular software system that promotes

efficient use of office building plug loads. The knowledge-

based system generates summary usage reports and alerts

building personnel of malfunctioning equipment and

unexpected plug load consumption. The system is planned

to be applied to Sustainability Base and is expected to

identify malfunctioning loads and reduce building energy

consumption.

1. INTRODUCTION

Lighting and HVAC loads have typically been the top

contributors to building energy consumption. However, as

technology advances have made these systems more

efficient, plug loads have become a relatively larger

contributor to energy usage. For example, in a typical

California office building lights consume around 40% of

total energy, HVAC 25% and plug loads 15% (Kaneda,

Jacobson & Rumsey, 2010; Moorefield, Frazer & Bendt,

2011). These proportions change in a high-performance

building, where unregulated plug loads can correspond to

more than 50% of total energy consumption (Lobato, Pless,

Sheppy & Torcellini, 2011). With the decreasing trend in

lighting and HVAC energy consumption and with more

dependence on computer and electronic equipment, plug

and process loads are taking up an increasingly larger slice

of the building energy use pie.

In terms of plug load energy consumption, it has been found

that motivated users are key to saving energy (Kaneda et al.,

2010). Employees who make use of built-in power saving

functionality and turn off devices when not in use can

significantly reduce energy waste, particularly during non-

business hours. In other words, many of the barriers to

reducing plug load energy use are behavioral, not technical.

As part of a NASA program to replace outdated and

inefficient buildings, NASA Ames Research Center recently

completed construction of Sustainability Base, a 50,000 sq.

ft. office building designed to exceed the Leadership in

Energy and Environmental Design (LEED) Platinum rating.

Beyond providing an inviting workspace for employees,

Sustainability Base has the following objectives:

1. To be a living, evolving research laboratory and

showcase facility for sustainable building research.

2. To provide a mechanism for the demonstration and

transfer of NASA aerospace technologies to the

building industry.

3. To be an experimental research facility relevant to

NASA’s interest in developing human habitats on Mars

and in space.

4. To facilitate collaboration by involving inter-

governmental, academic, nonprofit, and industry

partners in research on next generation sustainable

building technologies and concepts.

5. To reinforce NASA’s position on and support of the

Executive Order on Federal Leadership in

Environmental, Energy, and Economic Performance.

_____________________

Silva et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits

unrestricted use, distribution, and reproduction in any medium, provided

the original author and source are credited.


258


2

In addition to investigations of technologies such as

greywater recycling, data mining, prognostics,

computational fluid dynamics, fuel cells, and intelligent

control, NASA Ames will also examine the influence of

plug load management. Since Sustainability Base was not

yet occupied at the time of this study, a testbed was set up in

another building on campus to perform a preliminary plug

load management assessment (Poll & Teubert, 2012).

We wish to detect irregular plug load usage, malfunctioning

devices, and also whether the plug load management system

itself is performing as expected. In this paper, we

demonstrate the development of an expert system to analyze

data acquired from plug loads and to call attention to

potential issues. The main contribution is the development

of a modular, extensible knowledge-based system that can

be easily adapted to Sustainability Base or other buildings

that use plug load management.

This paper has the following structure: Section 2 discusses

related work on intelligent systems applied to sustainable

buildings. Section 3 describes the pilot study testbed,

including test environment, plug load management devices

and data collection. Section 4 describes the expert system

developed to analyze the data generated by the monitoring

system, concentrating on knowledge representation

techniques. Section 5 presents the results of applying the

expert system to the plug load data. Section 6 concludes

with some lessons learned and next steps to be applied to

Sustainability Base.

2. INTELLIGENT SYSTEMS FOR SUSTAINABLE BUILDINGS

Knowledge-based systems have been applied to Building

Energy Management Systems (BEMS), which play an

important role in occupant comfort and energy

consumption. In this area, Doukas, Patlitzianas, Iatropoulos

and Psarras (2007) describe an intelligent decision support

system using rule sets based on a typical building energy

management system. The knowledge base addresses the

following categories: internal comfort conditions, building

energy efficiency, and decision support. The decision

support module has the following functions: Interaction

with the sensors for the diagnosis of the building’s state;

incorporation of expert and intelligent system techniques in

order to select the appropriate interventions; communication

with the building’s controllers for the application of the

decision. The system enables central management of energy

consumption in buildings by translating the building energy

knowledge into several rules and finally into electronic

commands to actuator devices. The paper describes the

adopted methodology to develop the system using expert

knowledge for building energy management, the system

architecture, a summary of its rules and an appraisal of its

pilot application to a building. One of the main project

conclusions was that expert knowledge has significant

potential for improving building energy management, since

it provides the ability to modulate, with the help of the rules,

intelligent interventions.

As presented above, heating and cooling requirements play

a vital role in building energy demands, therefore the

definition of such requirements is essential during the

building design process. In this context, Ekici and Aksoy

(2011) introduce an Adaptive Neuro-Fuzzy Inference

System (ANFIS) to predict heating and cooling energy

needs. The inputs to the inference include physical

environmental parameters such as outdoor temperatures,

solar radiation and wind speed and direction in addition to

design parameters such as building form factor,

transparency ratio, insulation thickness, and orientation. The

performance of ANFIS was benchmarked with the results of

conventional calculation methods of building energy

requirements; the ANFIS models yielded a successful

prediction rate of 96.5% for heating and 83.8% for cooling

energy requirements.

Kwok, Yuen and Lee (2011) present an intelligent approach

to assess the effect of building occupancy on cooling load.

The reference presents a neural network that considered

external (outdoor temperature, relative humidity, rainfall,

wind speed, bright sunshine duration and global solar

radiation) and internal factors (occupancy area and

occupancy rate) as inputs; the total cooling load is the model

output. The occupancy rate is derived from the total energy

of primary air units, whose output of fresh air depends on

the measured CO2 level. When the number of occupants

increases, the CO2 concentration level increases and leads

to an increase in fresh air supply rate. The study includes a

sensitivity analysis considering three variations on the input

to the neural network: only external factors, inclusion of

occupancy area and addition of occupancy rate. The analysis

Equipment No. Equipment No.

Desktop 6 Calculator 1

Laptop 3 Storage drive 1

Printer 7 Battery charger 1

Phone 2 Vending machines 2

Speaker 3 Space heater 1

Scanner 3 External drive 1

Monitor 7 Coffee maker 1

Hub 2 Refrigerator 1

Copier 1 Bridge 1

Shredder 3 Microwave 1

Lamp 2 TOTAL 50

Table 1. List of equipment monitored


259


3

demonstrates the importance of occupancy data in the

building cooling load prediction. From these few examples,

it is clear that there is a need for applications of intelligent

systems for sustainable buildings.

3. TESTBED FOR PLUG LOAD MANAGEMENT

In preparation for deploying a plug load management

system to Sustainability Base, a pilot study was conducted

in another office building on the NASA Ames campus (Poll

& Teubert, 2012). The plug load management system

included 15 power strips, with 4 channels (receptacles) per

strip. Each channel is metered and can also be commanded

on or off. Power strips wirelessly transmitted data to and

commands from a cloud-based data service via bridges

connected to the building local area network. Minimum,

mean, and maximum power draws for one minute intervals

were recorded to a database. In order to collect a

representative set of data, the power strips were located in

different locations, including offices, a copy room, and a

break room. Table 1 lists the types and number of

equipment monitored.

Channels had various power consumption profiles and

operating modes (e.g., standby, idle, active). Both the

number of channels and power consumption characteristics

will change in the future deployment to Sustainability Base,

requiring that the knowledge-based system be easily

adaptable.

Power consumption data were collected over a period of

several weeks to establish a usage baseline. Then, schedule-

based control was used to power off and on groups of

devices at different times according to occupants’ work

schedules. In addition to employing time-based rules,

changes were made to the energy saver settings of certain

devices (e.g., time to standby mode, screen saver behavior).

4. EXPERT SYSTEM PROTOTYPE FOR PLUG LOAD

ANALYSIS

One of the main objectives of the plug load testbed was to

gain practical experience that could be transferred to

Sustainability Base, which was in the final stages of

commissioning at the time of this study. Consequently, one

of the first decisions made in developing the expert system

prototype was to create a knowledge-base that could be

easily adapted to the future setup, which will have a

different set of plug loads compared to the testbed.

The expert system prototype was developed in CLIPS

(Giarratano & Riley, 1994) using a combination of rules,

semantic network and object-oriented modeling.

Figure 1 presents the expert system prototype UML activity

diagram. The knowledge-base is composed of two parts: 1)

CLIPS Instances (setup dependent); 2) Rules and Methods

(setup independent), as discussed later. For graphical output,

a JavaScript library was used (Dygraphs, 2011).

The choice of CLIPS as developmental framework was

guided by the following factors:

- CLIPS Object-Oriented Language (COOL) module allows

to take full advantage of object-oriented modeling;

Figure 1. UML activity diagram of expert system prototype


260


4

- The representation paradigm was chosen based on

previous experiences in developing expert systems for

different engineering domains, including hydraulic system

(Silva & Back, 2000), cogeneration power plant design

(Matelli, Bazzo & Silva, 2009; Matelli, Bazzo & Silva,

2011), and natural gas transportation modeling (Starr &

Silva, 2005);

- The combination of object-oriented modeling, semantic

network, and rules in an incremental approach allows

modularity, expandability, and robustness;

- The framework allows for a rapid prototyping by the

knowledge engineer, which was a benefit in this case, given

previous experience and time limitations.

The prototype was designed to process raw data and

generate summary reports and graphs with useful

information for either building operators or occupants.

Table 2 lists the prototype elements (presented in Figure 1)

and their rationale. The primary functions can be

summarized as:

1. Alert loss of communication

2. Alert failure of schedule-based on/off rules

3. Alert abnormal power consumption

4. Alert possible channel change

5. Present power mode transitions

6. Present percentage of time in different power modes

7. Present overall energy consumption per day

Since we did not have much a priori information regarding

the different types of loads, and due to the fact that most

attributes are shared by all loads, we chose to generate all

load instances from the main class. However, as the system

evolves and more detailed information is obtained regarding

differences among loads, it is possible to modify the class

structure – adding sub-classes such as desktop, laptop,

printer, etc., and redefining the current instances according

to these new sub-classes. Such expansion would greatly

increase the ability to define specific methods without

requiring a considerable change in system code. Even with

the current structure, it is possible to treat different loads in

a specific manner. For example, the calculation of abnormal

consumption, i.e., out of a pre-defined power range, was

implemented for the photocopier. In order to do that, the

abnormal_range attribute value was defined for this

instance. Some loads required special treatment. Desktops

computers did not have schedule-based control applied

because removing power without a proper shutdown

Element Description

Define period of analysis User definable period of analysis during which all instance attributes are kept the same.

Change default on/off times Each load has time-based on/off attributes pre-defined in the instance set. These

attributes are key to identifying inconsistencies, since a load can be operating when it

shouldn't or vice-versa.

Access database Generate facts by extracting only relevant attributes from the database, such as channel,

initial time and average power.

CLIPS Instances Comprising the core of the system, instances define specific methods (e.g., calculate

time spent in each power mode) as well as attributes that describe load behavior such as

status, power, abnormal range, etc.

Check inconsistencies Set of rules and methods to accomplish the following functions:

Alert loss of communication if load doesn’t report measurement for more than 20

minutes.

Alert failure of schedule-based on/off rules: triggers a message if a load is on when it

should be off or conversely, if a load is off but should be on.

Alert abnormal power consumption: triggers a message if load is consuming power out

of any previously defined range or if the load is consuming power in a range for

longer than expected (e.g., transition to standby mode after 60 minutes).

Alert possible channel change: triggers a message if power consumption pattern

indicates that a different load may have been plugged into a channel. In this case, the

system writes a message to the end of the daily report, indicating the time when the

change was detected and which channel(s) was switched.

Generate Reports Present power mode transitions: records a message if load changes modes (e.g., on to

off, standby to idle, idle to active).

Present percentage of time in different power modes: records duty cycle information for

each channel.

Present overall energy consumption per day: record the total kilowatt-hour energy

consumption for each channel.

Calculate wasted-energy: records energy consumption of phantom loads.

Graphical reports for different fault modes.

Table 2. Knowledge base elements of expert system prototype


261


5

sequence could result in data loss or damage to the

computer.

Additionally, turning off some printers during non-business

hours was found to use more energy than leaving the device

in standby mode because of high power consumption for the

warm-up and idle modes.

5. RESULTS

The prototype system was implemented and tested using

data gathered from the plug load management system. All

functions listed in Table 2 were tested by comparing the

daily reports created by the prototype system to the recorded

data.

Figure 2 shows outputs from one of the functions that

checks for inconsistencies. Figure 2a plots the typical

behavior, which shows that channel 14.1 was turned off

with a schedule-based rule at 10 pm on July 18. However,

on the next day, the same channel remained on after this

time. Figure 2b shows that an Alert has been triggered at

22:05 (see alert message in upper right of graph), indicating

that the schedule-based rule has failed.

The system also generates verbose report, whose snippet is

presented in Figure 3. Firstly, Figure 3 presents each time a

mode transition takes place, anomalies such as a possible

change in channels, consumption in an abnormal range, and

loss of communication. The next part presents a table with

percentages of time spent in different modes. The final note

calls attention to items which require inspection.

Although the loads were modeled as a single class, it is

possible to study distinct behaviors. For example, for

channel 5.0 there is a special rule that checks to see if the

device transitions to standby mode after 60 minutes of

inactivity. As shown in Figure 4a, the copier transitions to

standby mode (~60W) as expected. Figure 4b shows that on

the next day at 14:35 it failed to transition to standby mode

and had excessive power draw for the remainder of the day.

In terms of expansion and use in Sustainability Base, only

the template to database access and the instance set need to

be changed. Neither task requires a skilled programmer

a) Channel switches off at 10 pm b) Channel fails to switch off at 10 pm

Figure 2. Example of schedule-based rules

Report corresponding to date:20110728

6.3 @12:30 mode change on to idle power= 4.61

11.3 @12:30 mode change standby to on power= 1.3

…

@12:45 9.3 consumed power out of all ranges (phantom,

standby, idle and active). Possible change in the channel has

occurred. Check the user. Power= 88.15

…

5.0 @13:30 consuming in abnormal range:Power:265.05

…

4.0 @22:00 loss of communication

…

Channel Total kWh % on % off %other modes (*)

1.0 1.88 30.43 0.00 69.56

1.1 0.17 0.72 16.67 82.61

…

15.0 0.15 33.33 60.15 6.52

15.1 0.10 34.06 60.15 5.80

* includes phantom, standby and idle modes

Attention: it is possible channels were changed because

@ 12:45 channel 9.3 consumed power 88.15 out of its normal

ranges. Check other channels in the same node.

Attention: it is possible channels were changed because

@ 12:45 channel 9.1 consumed power 27.58 out of its normal

ranges. Check other channels in the same node

Figure 3. Example text report


262


6

because the rules/methods do not refer to specific instances;

they are independent from the operational setup. The

knowledge base proved to be extensible as it incorporated

additional attributes as the specifications increased in

complexity. Future work will implement plug load

subclasses to include specific information allowing a

modular expansion.

6. CONCLUSION

The paper presents the development of a knowledge based

system to analyze plug loads, which are becoming

increasingly important in high-performance buildings. The

system processes data acquired from a plug load monitoring

system, triggers alerts and generates reports. The alerts call

attention to malfunctioning equipment, failure of schedule-

based rules, or changes in use pattern. The reports

summarize plug load power consumption statistics.

Providing such feedback to occupants is expected to identify

malfunctioning equipment and reduce the energy

consumption of Sustainability Base.

ACKNOWLEDGEMENT

This project was developed under Grant 4095/10-3, CAPES

Foundation, Brazil. The authors would also like to thank

UFSC- Federal Univ. of Santa Catarina (Brazil), and NASA

Ames Research Center for their support.

REFERENCES

Doukas, H., Patlitzianas, K. D., Iatropoulos, K., & Psarras,

J. (2007). Intelligent Building Energy Management

System Using Rule Sets. Building and Environment 42:

3562–3569.

Dygraphs JavaScript Visualization Library,

http://dygraphs.com/ accessed Nov, 2011.

Ekici, B. B., & Aksoy, U. T. (2011). Prediction of Building

Energy Needs in Early Stage of Design by Using

ANFIS. Expert Systems with Applications 38: 5352–

5358.

Giarratano, J., & Riley, G. (1994). Expert Systems -

Principles and Programming, Second Edition, PWS

Publishing Company.

Kaneda, D., Jacobson, B., & Rumsey, P. (2010). Plug Load

Reduction: The Next Big Hurdle for Net Zero Energy

Building Design. ACEEE Summer Study on Energy

Efficiency in Buildings.

Kwok, S.S.K., Yuen, R.K.K., & Lee, E.W.M. (2011). An

Intelligent Approach to Assessing the Effect of

Building Occupancy on Building Cooling Load

Prediction. Building and Environment 46: 1681-1690.

Lobato, C., Pless, S., Sheppy, M., & Torcellini, P. (2011).

Reducing Plug and Process Loads for a Large Scale,

Low Energy Office Building: NREL Research Support

Facility. (Tech Rep. No. NREL/CP-5500-49002).

National Renewable Energy Laboratory.

Moorefield, L., Frazer, B., & Bendt, P. (2011). Office Plug

Load Field Monitoring Report. California Energy

Commission, PIER Energy-Related Environmental

Research Program, Tech. Rep. CEC-500-2011-010.

Matelli, J. A., Bazzo, E., Silva, J. C. (2009). An Expert

System Prototype for Designing Natural Gas

Cogeneration Plants. Expert Systems with Applications,

v. 36, p. 8375-8384.

Matelli, J. A., Bazzo, E., & Silva, J. C. (2011).

Development of a Case-Based Reasoning Prototype for

Cogeneration Plant Design. Applied Energy, v. 88, p.

3030-3041, 2011.

a) Normal transition b) Failure to transition

Figure 4. Examples of transition to standby


263


7

Poll, S., & Teubert, C. (2012). Pilot Study of a Plug Load

Management System: Preparing for Sustainability Base.

In Proceedings of 2012 IEEE Green Technologies

Conference. Institute of Electrical and Electronics

Engineers, Inc.

Silva, J. C., & Back, N. (2000). Shaping the Process of

Fluid Power System Design Applying an Expert

System. Research in Engineering Design., Vol. 12.

Starr, R. R., & Silva, J.C. (2005). Leak Detection in Gas

Pipelines- A Knowledge Based Approach. Proceedings

18th International Brazilian Congress of Mechanical

Engineering, COBEM 2005.


264

Integrated Vehicle Health Management and Unmanned Aviation

Andrew Heaton1, Ip-Shing Fan2 Craig Lawson3, and Jim McFeat4

1,2IVHM Centre, Cranfield University, Conway House, Medway Court, University Way,

Cranfield Technology Park, MK43 0FQ, UK

[email protected]@cranfield.ac.uk

3 Aerospace Engineering Department, School of Engineering, Cranfield University, MK43 0AL, UK

[email protected]

4BAE Systems, Warton Aerodrome, W374B, Preston, Lancashire, PR4 1AX, [email protected]

ABSTRACT

Over the past decade the use of unmanned aerial systems(UAS) has increased in military, intelligence, andsurveillance operations for dull, dirty and dangerous (DDD)missions. They have primarily been used at a time of war,and been pushed into service by programmes designedincrease the capabilities of military organisations, with littlethought of supportability or interaction with other air users.The increased use of UAS in war time and the ensuingmedia coverage has naturally led proposed use of UAS in acivilian context, being used for a wide range of non-militaryDDD missions: from land usage and crop monitoring; to themonitoring of nuclear power plants; to inspection of powerlines.

But when considering the use of UAS this civilian contextbrings up important issues, such as: How can UAS beintegrated into civil unsegregated airspace, and how willthey react to other air traffic (manned and unmanned)? Howcan UAS be shown to be safe to get general public, especialwith an increased level of autonomy? What technologies areneeded to ensure the safe use of UAS? Is using a UAS moreeconomical than using the manned equivalent? The list goeson.

Steps have already been taken to answer these issues, in theUnited Kingdom the Civil Aviation Authority (CAA) haveproduced guidance (CAP 722) for manufactures andoperators of UAS, to allow them to build and operate UASwhilst developing a framework to fully integrate them into

the civil airspace. In addition the CAA is working closelywith industry-led consortium ASTRAEA (AutonomousSystems Technology Related Airborne Evaluation &Assessment) to help solve the issues of using UAS in civilairspace, and with the United States Federal AviationAdministration (FAA) having been set a 2015 deadline forfull integration, hastening the need for solutions to be found.

The poster will take a holistic systems engineering view ofthe current situation in unmanned aviation and whereIntegrated Vehicle Health Management (IVHM) might beused (in whole or in part) to solve some of the issuesmentioned above.

It will present reasons why you may wish to include IVHMin a UAS (e.g. cost, size, safety), the potential benefits (e.g.increased availability, reduced maintenance costs) andpitfalls (e.g. false positives) of implementing an IVHM on aUAS. It will also map how the IVHM (system of interest)interacts with the rest of the UAS (wider system of interest),how it relates to the issues the aviation industry have withUAS (environment) and the general public (widerenvironment).

_____________________Andrew Heaton et al. This is an open-access article distributed under theterms of the Creative Commons Attribution 3.0 United States License,which permits unrestricted use, distribution, and reproduction in anymedium, provided the original author and source are credited.


265


266

Author Index

Adhikari, Partha Pratim, 172Anger, ChristophSchrader, Robert, 202Archimède, Bernard, 1

Baraldi, Piero, 90Biswas, Gautam, 156Bolognese, Danilo, 127Boulet, Xavier, 250Bregon, Anibal, 214Brown, Douglas, 183Buderath, Matthias, 59, 172, 225

Camci, F., 98, 148Celaya, José R., 69, 156Charbonnaud, Philippe, 1Compare, Michele, 90Corbetta, M., 104Cross, Joshua, 34

da Silva, Jonny Carlos, 258DaghighiAsli, Afshin, 239Darr, Duane, 183Desforges, Xavier, 1Diévart, Mickaël, 1

Eker, O.F., 148Esperon-Miguez, Manuel, 192

Fan, Ip-Shing, 265Fasana, A., 51Fasol, Dieter, 256Ferrara, Davide, 127Feuillard, Vincent, 42Finda, Jindrich, 165

Garibaldi, L., 51Giglio, M., 104Goebel, Kai, 69, 156Gola, Giulio, 141Gutsche, Katja, 244

Hédl, Radek, 165Hafner, Michaël, 17Haines, Conor, 59Hayati, Leila, 239Heaton, Andrew, 265Hmad, Ouadie, 250Hulsund, John Einar, 141

Jacazio, Giovanni, 127Jennions, Ian K., 148, 192

John, Philip, 192

Klingauf, Uwe, 202Kulkarni, Chetan S., 156Kunze, Ulrich, 25Kwapisz, David, 17

Löhr, Andreas, 59Lamoureux, Benjamin, 10Laskowski, Bernard, 183Lauffer, Jim, 80Lawson, Craig, 265Lorton, Ariane, 42Lucas, Andrew, 34

Münker, Burkhard, 256Manes, A., 104Marchesiello, S., 51Massé, Jean-Rémi, 10, 250McFeat, Jim, 265Mechbal, Nazih, 10Medjaher, K., 98Merino, Alejandro, 214Mikat, Heiko, 225Morse, Jefferey, 183

Nystad, Bent Helge, 141

Ompusunggu, Agusmian Partogi, 114

Pirra, M., 51Poll, Scott, 258Pulido, Belarmino, 214

Raab, Stefan, 25Rajamani, Ravi, 17Rezaie, Vahid, 239Roychoudhury, Indranil, 69

Saha, Bhaskar, 69Saha, Sankalita, 69Sas, Paul, 114Sauco, Sergio, 90Saxena, Abhinav, 69Sbarufatti, C., 104Sen Gupta, Jayant, 42Siddiolo, Antonino Marco, 225Sorli, Massimo, 127Stecki, Chris, 34Stecki, Jacek, 34

Trinquier, Christian, 42

Van Brusse, Hendrik, 114

Vandenplas, Steve, 114

Vechart, Andrew, 165

Zamarreño, Jesus Maria, 214Zerhouni, N., 98Zio, Enrico, 90

Proceedings of the First European Conference of the Prognostics ...

Documents