www.phmsociety.org ISBN - 978-1-936263-04-2 Proceedings of First European Conference of the Prognostics and Health Management Society 2012 PHM-E’12 ISBN - 978-1-936263-04-2 Dresden, Germany July 3 - 5, 2012 Edited by: Anibal Bregon Abhinav Saxena
www.phmsociety.org ISBN - 978-1-936263-04-2
Proceedings of
First European Conference of the
Prognostics and Health Management
Society 2012
PHM-E’12
ISBN - 978-1-936263-04-2
Dresden, Germany
July 3 - 5, 2012
Edited by: Anibal Bregon Abhinav Saxena
First European Conference of the Prognostics and Health Management Society, 2012
ii
First European Conference of the Prognostics and Health Management Society, 2012
Table of ContentsFull PapersA distributed Architecture to implement a Prognostic Function for Complex SystemsXavier Desforges, Mickaël Diévart, Philippe Charbonnaud, and Bernard Archimède 2
An Approach to the Health Monitoring of a Pumping Unit in an Aircraft Engine Fuel SystemBenjamin Lamoureux, Jean-Rémi Massé, and Nazih Mechbal 10
Application of Microwave Sensing to Blade Health MonitoringDavid Kwapisz, Michaël Hafner, and Ravi Rajamani 17
Assessment of Remaining Useful Life of Power Plant Steam Generators - a Standardized Industrial ApplicationUlrich Kunze and Stefan Raab 25
Autonomous Prognostics and Health Management (APHM)Jacek Stecki, Joshua Cross, Chris Stecki, and Andrew Lucas 34
Characterization of prognosis methods: an industrial approachJayant Sen Gupta, Christian Trinquier, Ariane Lorton, and Vincent Feuillard 42
Damage identification and external effects removal for roller bearing diagnosticsM. Pirra, A. Fasana, L. Garibaldi, and S. Marchesiello 51
Data Management Backbone for Embedded and PC-based Systems Using OSA-CBM and OSA-EAIAndreas Löhr, Conor Haines, and Matthias Buderath 59
Designing Data-Driven Battery Prognostic Approaches for Variable Loading Profiles: Some Lessons LearnedAbhinav Saxena, José R. Celaya, Indranil Roychoudhury, Sankalita Saha, Bhaskar Saha, and Kai Goebel 69
Diagnostics Driven PHM. The Balanced SolutionJim Lauffer 80
Fatigue Crack Growth Prognostics by Particle Filtering and Ensemble Neural NetworksPiero Baraldi, Michele Compare, Sergio Sauco, and Enrico Zio 90
Feature Extraction and Evaluation for Health Assessment and Failure PrognosticsK. Medjaher, F. Camci, and N. Zerhouni 98
Finite Element based Bayesian Particle Filtering for the estimation of crack damage evolution on metallic panelsSbarufatti C., Corbetta M., Manes A., and Giglio M. 104
Health Assessment and Prognostics of Automotive ClutchesAgusmian Partogi Ompusunggu, Steve Vandenplas, Paul Sas, and Hendrik Van Brussel 114
Health management system for the pantographs of tilting trainsGiovanni Jacazio, Massimo Sorli, Danilo Bolognese, Davide Ferrara 127
Lifetime models for remaining useful life estimation with randomly distributed failure thresholdsBent Helge Nystad, Giulio Gola, and John Einar Hulsund 141
Major Challenges in Prognostics: Study on Benchmarking Prognostics DatasetsO. F. Eker, F. Camci, and I. K. Jennions 148
Physics Based Electrolytic Capacitor Degradation Models for Prognostic Studies under Thermal OverstressChetan S. Kulkarni, José R. Celaya, Kai Goebel, and Gautam Biswas 156
Prediction of Fatigue Crack Growth in Airframe StructuresJindrich Finda, Andrew Vechart, and Radek Hédl 165
Simulation Framework and Certification Guidance for Condition Monitoring and Prognostic Health ManagementMatthias Buderath and Partha Pratim Adhikari 172
Theoretical and Experimental Evaluation of a Real-Time Corrosion Monitoring System for Measuring Pitting in AircraftStructuresDouglas Brown, Duane Darr, Jefferey Morse, and Bernard Laskowski 183
iii
First European Conference of the Prognostics and Health Management Society, 2012
Uncertainty of performance requirements for IVHM tools according to business targetsManuel Esperon-Miguez, Philip John, and Ian K. Jennions 192Unscented Kalman Filter with Gaussian Process Degradation Model for Bearing Fault PrognosisChristoph Anger, Robert Schrader, and Uwe Klingauf 202Using structural decomposition methods to design gray-box models for fault diagnosis of complex industrial systems: abeet sugar factory case studyBelarmino Pulido, Jesus Maria Zamarreño, Alejandro Merino, and Anibal Bregon 214Virtual Framework for Validation and Verification of System Design Requirements to enable Condition Based Mainte-nanceHeiko Mikat, Antonino Marco Siddiolo, and Matthias Buderath 225
Poster PapersAnalyzing Imbalance in a 24 MW Steam TurbineAfshin DaghighiAsli, Vahid Rezaie, and Leila Hayati 240Economic reasoning for Asset Health Management Systems in volatile marketsKatja Gutsche 244System PHM Algorithm MaturationJean-Rémi Massé, Ouadie Hmad, and Xavier Boulet 250Design for Availability - Flexible System Evaluation with a Model Library of Generic RAMST BlocksDieter Fasol and Burkhard Münker 256Knowledge-Based System to Support Plug Load ManagementJonny Carlos da Silva and Scott Poll 258Integrated Vehicle Health Management and Unmanned AviationAndrew Heaton, Ip-Shing Fan, Craig Lawson, and Jim McFeat 265
Author Index 267
iv
Full Papers
A distributed Architecture to implement a Prognostic Function for
Complex Systems
Xavier Desforges1, Mickaël Diévart
2, Philippe Charbonnaud
1 and Bernard Archimède
1
1Univeristé de Toulouse, INPT, ENIT, Laboratoire Génie de Production, 65016 Tarbes, France
2Aéroconseil, 31703 Blagnac, France
ABSTRACT
The proactivity in maintenance management is improved by
the implementation of CBM (Condition-Based
Maintenance) and of PHM (Prognostic and Health
Management). These implementations use data about the
health status of the systems. Among them, prognostic data
make it possible to evaluate the future health of the systems.
The Remaining Useful Lifetimes (RULs) of the components
is frequently required to prognose systems. However, the
availability of complex systems for productive tasks is often
expressed in terms of RULs of functions and/or subsystems;
those RULs provide information about the components.
Indeed, the maintenance operators must know what
components need maintenance actions in order to increase
the RULs of the functions or subsystems, and consequently
the availability of the complex systems. This paper aims at
defining a generic prognostic function of complex systems
aiming at prognosing its subsystems, functions and at
enabling the isolation of components that needs
maintenance actions. The proposed function requires
knowledge about the system to be prognosed. The
corresponding models are detailed. The proposed prognostic
function contains graph traversal so its distribution is
proposed to increase the calculation speed. It is carried out
by generic agents.
1. INTRODUCTION
The implementation of the Condition-Based Maintenance
(CBM) recommendations usually leads to the improvement
of the equipment availability (Jardine, Lin and Banjevic,
2006; Scarf, 2007). The CBM actions are planned and led
according to the health status of equipments. Monitoring,
diagnostic and prognostic functions assess these statuses.
The development of health assessment functions has often
been considered as a downstream activity in the design
process of complex systems with few allocated means. This
has often led to a lack of collaboration with upstream
activities and to centralized deployment in light
computational modules although those functions have to
process numerous pieces of data of different kinds. The
consequences are increasing rates of useless replacements of
devices, with the increasing complexity of the systems.
Those replacements are not only costly but may also cause
additional damage to the system.
Therefore, health assessment functions now become a major
issue for complex system designers. Among those functions,
the prognostic function aims at defining the future health of
the system that contributes to plan productive tasks or
maintenance tasks. Among the difficulties leading to the
implementation of prognostic functions in complex systems,
there are the numerous hardware or software components,
devices, functions or subsystems of complex systems. Those
equipments are designed, manufactured, assembled by
different industrial partners (OEMs, suppliers,
subcontractors, etc.). Each partner has a part of the needed
knowledge to carry out the prognosis of the complex system.
However, some pieces of this knowledge are parts of the
own know-how of the partners and so they cannot be shared.
To tackle this difficulty, a decentralized/distributed
architecture can be proposed. Indeed, such architectures
enable the implementation of the Remaining Useful
Lifetime (RUL) assessment and prognostic functions closer
to components, devices, functions or subsystems. Therefore,
each OEMs, suppliers or subcontractors can provide RUL
assessment and prognostic functions for their equipments.
Nevertheless, those functions have to collaborate in order to
ensure the convergence of the prognostic process of
_____________________
Xavier Desforges et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License,
which permits unrestricted use, distribution, and reproduction in any
medium, provided the original author and source are credited.
First European Conference of the Prognostics and Health Management Society, 2012
2
European Conference of Prognostics and Health Management Society 2012
2
complex systems. Indeed, the union of local prognoses is
not the global prognosis. To illustrate that point, let us
consider a system made of a power supply and a computer.
If the RUL of the power supply is lower than the one of the
computer, the computer will not probably be able to carry
out its activity beyond the RUL of the power supply. Agents
that carry out RUL assessment and prognostic function can
be used to ensure this collaboration. An agent is defined as a
self-contained problem-solving computational entity able, at
least, to perform most of its problem-solving tasks, to
interact with its environment, to perceive its environment
and to respond within a given time, to take initiatives when
it is appropriate (Jennings and Wooldridge, 1995).
The aim of this article is to present an architecture for
implementing a distributed prognostic function for complex
systems. Firstly, the interest of distributed prognostic
function is discussed. To implement prognostic function,
knowledge about the complex system is necessary. Then the
paper describes the principles of the prognostic function for
complex systems. The notion of Time before Out of order
(TBO) is introduced. Then the paper shows the proposed
architecture that is based on the multi-agent system concept
with generic agents and how to split up the modeled
knowledge between the agents.
2. DISTRIBUTED PROGNOSIS
One aim of the Prognostic and Health Management (PHM)
is to assess the ability of complex systems to carry out
future tasks from diagnostic and prognostic results and the
definition of the constraints of the future tasks. Roemer,
Byington, Kacprzynski and Vachtsevanos (2007) advise that
diagnostic and prognostic algorithm should be processed as
close as possible to the monitored components and that the
produced data should be then exploited by ascending the
hierarchical structure of the complex system. Therefore,
bringing the PHM into operation requires the
implementation of prognostic functions.
If Vachtsevanos and Wang (2001) consider that the
prognostic activity consists in assessing a RUL once an
early detection of failure have been made, Lebold and
Thurston (2001) consider that it is a reliable assessment of
the RUL of a system or a device. From these studies it
appears that the assessment of the RUL is the keystone of
the prognostic activity. Indeed, the data it provides are used
as decision support for maintenance planning and proactive
maintenance (Iung, Monnin, Voisin, Cocheteux and Levrat,
2008) or for e-maintenance (Muller, Crespo Marquez and
Iung, 2008).
Several studies have been led dealing with the design of
prognostic functions of devices. Several techniques are
described in (Vachtsevanos, Lewis, Roemer, Hess and Wu,
2006). Nevertheless, in the case of complex systems, the set
of the RULs of the devices may not be enough to be a
suitable decision support for maintenance or for productive
tasks planning purposes. The sets of RULs shall therefore be
processed. In complex systems, the number of RULs can be
so huge that the only reasonable way to process them is
distributed. Another good reason in using distributed
architecture is that it enables implementations of prognostic
processes as close as possible to the monitored devices as
Roemer et al. (2007) advise it. Works dealing with
distributed prognosis are quite recent and several ways to
distribute the prognostic processes were already proposed.
In (Voisin, Levrat, Cocheteux and Iung, 2010) the prognosis
is considered as a business process whose activities can be
distributed in a context of e-maintenance. The mentioned
distribution is made according to different actors located on
different sites.
Saha, Saha, and Goebel (2009) propose an architecture
made of several agents that can communicate between each
other. An agent diagnoses a device and when it detects a
fault it switches to the prognostic mode and informs a base
station. The base station plans tasks, can reinitialize the
processes of agents if errors are detected, it manages the
accesses to resources like the ones to an external database
and it manages the availability of agent in terms of
computation load.
Dragomir, Gouriveau, Zerhouni and Dragomir (2007)
present an architecture for health assessment that consists of
two levels: the local level corresponds to the components
and global level that is associated to the complex system. In
this architecture, each local agent brings into operation
several known prognostic methods according to the
available knowledge about the monitored component. The
global agent collects the health assessment data from the
local agents and computes a health assessment for the
system thanks to a neural network.
Takai and Kumar (2011) propose a decentralized prognoser
for discrete event systems where local agents generate
prognoses that are sent to the other agents. Then the agents
cooperate in order to converge to a prognosis of the system
thanks to an inference engine.
The sets of RULs shall also be processed according to
knowledge as mentioned in (Saha et al., 2009).
3. KNOWLEDGE MODELING
During the design stage of a complex system, different
kinds of knowledge are elaborated. Among them, the
structural knowledge, the functional knowledge and the
behavioral knowledge are required to implement prognostic
functions (Reiter, 1992; Chittaro and Ranon, 2003). HAZard
and OPerability (HAZOP) methodology, that is a process
hazard analysis technique, enable to study not only the
hazards of a system, but also its operability problems, by
exploring the effects of any deviations from design
conditions (Dunjo, Fthenakis, Vilchez and Arnaldos, 2010).
First European Conference of the Prognostics and Health Management Society, 2012
3
European Conference of Prognostics and Health Management Society 2012
3
This methodology enables to identify functions and
interconnections.
3.1. The functional knowledge modeling
The functional knowledge modeling aims at providing the
sets of components that implements the functions of the
complex system from the users point of views. Knowing the
RUL of a function will help to plan future missions of the
system and/or the maintenance actions it needs.
Therefore, the functional knowledge modeling consists in
defining functions as sets of components or devices, which
we call all “devices”. Functions can also be made of
functions. Complex systems can also be divided into
subsystems. In that case, a subsystem can be considered as a
set of functions. Thus, a complex system is made of
subsystems. A subsystem is made of functions. A function
is made of devices and/or functions.
Three types of functions must be considered for the
computation of prognostic of functions.
Simple functions are functions that fail if one their entities
fails (devices or functions) at least.
For reliability purposes, complex systems contain functions
with redundancies. These functions are carried out by at
least two entities (devices and/or subfunctions) that bring
into operation the same activities, services... For example,
we suppose that a flight control function of an aircraft is
made of three functions we call “flight controllers”. If one
or even two flight controllers fail, the flight control can still
carry out its task. However, if two flight controllers fail,
there is no more redundancy. That is why we consider
redundancies as functions called redundancy functions.
Those functions are the only entities included in the
functions with redundancies.
Subsystems are considered as sets of functions that are not
included in other functions. Thus, subsystems can be
considered as simple functions. The prognostic of the
complex system can then be assessed from the prognostic of
its subsystems.
The modeling of this knowledge can be done thanks to a
UML (Unified Modeling Language), which is an object
oriented modeling language, class diagram shown in
figure 1.
3.2. The structural knowledge modeling
The structural modeling aims at representing the direct
interactions between devices and their failure modes mainly
in order to propagate the effects of failures (Worn, Langle,
Albert, Kazi, Brighenti, Revuelta Seijo, Senior, Sanz-Bobi
and Villar Collado, 2004).
Figure 1. Functional knowledge model.
Failure Modes and Effects Analysis (FMEA) or HAZOP
studies enable to collect the necessary knowledge for
structural modeling. Indeed, those studies enable to identify
what happens to other devices when one or several devices
fail.
Therefore the structural knowledge can be modeled thanks
to a set of arcs
with , where an arc
means that the device will be out of order (mode ) if
the failure mode of a device occurs. Let us note that
the mode can be the mode . However, some
particular cases exist. For example, a laptop uses a power
supply function. Let us simplify by assuming that the
battery and the electricity distribution network carry out this
function. If only the battery or the electricity distribution
network fails, the computer still operates normally. That is
why, the cases where a function fails or becomes out
order makes components become out of order must also be
considered. Thus, the structural model must also represent a
set of arcs with the same meaning
with
. So, the structural model consists of the sets S1 and
S2. Those sets of arcs represent a graph where the nodes are
the failure modes of the devices and the functions of the
complex system.
The out of order mode (Moo) is quite relevant because it
indicates that the origin of the predicted failure of an entity
is not the entity itself.
3.3. The behavioral knowledge modeling
The behavioral modeling mainly aims at defining the
dynamical behavior of a system. Behavioral models can be
used to detect degradation and to analyze their trends in
order to define the RUL of the monitored device.
The behavioral models used to prognose a device can be
achieved thanks to three approaches (Byington, Roemer,
Watson and Galie, 2003): experience-based, evolutionary
and/or statistical trending or model based. The implemented
behavioral models to prognose a complex system can so be
numerous and of various kinds that it is difficult to consider
First European Conference of the Prognostics and Health Management Society, 2012
4
European Conference of Prognostics and Health Management Society 2012
4
all of them. They also require design knowledge of devices,
functions or subsystems that may reveal the know-how of
their providers. Nevertheless, the most important things are
what they contribute to produce: the RULs of the devices.
We then assume that a monitoring layer made of one or
several agents provides the RULs to the proposed
prognostic function. The monitoring layer agents can so
bring into operation the most suitable techniques to assess
RULs of devices.
3.4. RUL modeling
In order to be processed by a prognostic function, a RUL
has to assess a duration T between the instant t0, at which it
is calculated, and the predicted instant t0+T at which the
device will fail according to a given failure mode. Thus, a
RUL must contain four entities (Voisin et al., 2010) that are:
the involved device,
the involved failure or degraded mode,
the instant at which it was calculated,
the duration.
RULs are assessments, so fields can be added to deal with
uncertainty or confidence level. However, the proposed
prognostic function of complex system does not take into
account any kind of uncertainty representations. The aim of
this paper is to propose a principle for prognosing a
complex system that can be distributed into generic agents.
Handling uncertainty of RULs would likely lead to
implement different processes into the tasks described in
section 4.
The RULs that the monitoring layer provides are the base of
the proposed prognostic function for complex systems but
this function also needs the functional and structural
knowledge.
4. PROGNOSTIC FUNCTION FOR COMPLEX SYSTEMS
This section is dedicated to the proposed generic principle
for prognosing complex systems from RULs and from the
modeled functional and structural knowledge. We assume
that the monitoring layer sends to the Complex System
Prognostic Function (CSPF) each RUL that it computes for
each failure mode of the devices unless the out of order
mode. The CSPF is divided into three main tasks that are:
1. the computation of the RUL of the device for which a
RUL has been received,
2. the computation of the RUL of the devices that are
interconnected (directly or not) to the device for which
a RUL has been received,
3. the computation of the RUL of functions from the
former task,
The process of the CSPF starts when a RUL is received
from the monitoring layer.
4.1. Computation of the RUL of a device (task 1)
The RUL received by the CSPF from the monitoring layer is
noted RUL(Di, Mj, tk, Tl) where Di is the device, Mj is the
predicted failure mode, tk is the instant at which the RUL
was computed and Tl is remaining lifetime such as tk+Tl is
the predicted instant at which the failure will likely occur.
When a RUL RUL(Di, Mf, tk, Tl) is received at the instant t,
it is recorded and it replaces the last stored RUL for Di with
the failure mode Mj RUL(Di, Mf, tk-1, Tl-1) if tk > tk-1 else the
task stops. If tk > tk-1, the RUL of Di is then defined thanks to
its last recorded RULs for all its failure modes. These RULs
are noted RUL(Di, Mj, tkj, Tlj). The new RUL of Di becomes
RUL(Di, Mp, t, Tp) where p correspond to the failure mode
for which:
(1)
Then this RUL is compared to the last recorded RUL for the
device noted RUL(Di, Mq, tkq, Tlq) if , RUL(Di, Mp, t, Tp) becomes the new
RUL of the device Di. It is stored and replaces RUL(Di, Mq,
tkq, Tlq) and then, if at least one arc starts from the node
, the task 2 is processed else the task 3 is processed. If
RUL(Di, Mp, t, Tp) does not become the new RUL of the
device Di the CSPF stops.
4.2. Computation of the RUL of the devices that are
interconnected (task 2)
This task consists in propagating the new RUL(Di, Mp, t, Tp)
in the graph described by the arcs
where
the devices will likely be out of order earlier than previously
predicted because of this new RUL. We must here introduce
the notion of Time Before Out of order (TBO). This notion
explains that a device will likely become out of order
because of the failure Mp of the device Di. This notion is
meaningful for maintenance because it enables to localize
the devices for which maintenance actions will be
necessary. The TBO so contains five entities:
the involved device,
the device for which the RUL has generated the TBO,
the failure mode of the device for which the RUL has
generated the TBO,
the instant at which the TBO was computed,
the remaining time before the out of order mode occurs.
If the prognostic function handles uncertainty, TBOs must
also contain fields dealing with this notion. That is not the
case in this paper.
This second task does not consist of the computation of the
new RULs of the interconnected devices but of their new
TBOs. Two cases are considered:
one for the arcs
,
First European Conference of the Prognostics and Health Management Society, 2012
5
European Conference of Prognostics and Health Management Society 2012
5
one for the arcs
.
For all the arcs
, RUL(Di, Mp, tkp, Tp) is
compared to the last recorded TBOs of the devices Dj for
which TBOs are noted TBO(Dj, Dx, Mqx, tkqx, Tqx) with j≠x.
This comparison is made at the instant t. If , then the new TBO of Dj becomes
TBO(Dj, Di, Mp, t, Tpt) with . This new
TBO is recorded and replaces the previous stored one and it
is propagated in the graph from the node otherwise
the propagation in the graph from the node is
stopped.
For all the other arcs
the TBO of the
device Dn noted TBO(Dn, Di, Mp, tpt, Tpt) is compared to the
last recorded TBO of Dm noted TBO(Dm, Dj, Mq, tqt, Tqt).
This comparison is made at the instant t. If , then the new TBO of Dm becomes
TBO(Dm, Di, Mp, t, T) with . This new
TBO is recorded and replaces the previous stored one and it
is propagated in the graph from the node otherwise
the propagation from the node is stopped.
This tasks ends when there is no more TBO to propagate.
Then the prognostic of the functions must be done by the
CSPF from the RULs and TBOs that were updated.
4.3. Computation of the RUL of the functions (task 3)
According to section 3.1, three types of functions must be
considered for the computation of their prognoses: simple
functions, functions with redundancies and redundancy
functions.
The failure mode of a function is directly linked to the
failure mode of one of its devices and/or to the missing
service carry out by one of its subfunctions. That is why we
only consider the TBOs of the functions instead of their
RULs. The TBO of a function contains the same fields as
the ones of the TBOs for devices except the involved
function instead of the involved device.
The TBO of a function is computed if, at least, one RUL of
one its devices has been modified or if one TBO of one of
its entities (devices or functions), noted X, has been
modified by the CSPF.
For a simple function Fj, if the RUL RUL(Di, Mp, tp, Tp) of
one of its device has been modified its TBO(Fj, Dk, Ml, tl, Tl)
is modified if is verified then it becomes
TBO(Fj, Di, Mp, t, T) with .
For a simple function Fj, if the TBO of one of its functions
or of its devices TBO(Xq, Di, Mp, tp, Tp), where Xq denotes
either the function or the device, has been modified its
TBO(Fj, Dk, Ml, tl, Tl) is modified if is
verified then it becomes TBO(Fj, Di, Mp, t, T) with
.
The new TBO is recorded and replace the previous stored
one.
For functions with redundancies, the TBOs and/or RULs of
their entities included in their redundancy functions are
considered. For an entity that is a device Di, we consider its
RUL RUL(Di, Mp, tp, Tp) or its TBO TBO(Di,, Dx, Mq, tq, Tq)
and the value Tti that is computed with the relationship (2):
(2)
If an entity is a function Fj with TBO(Fj,, Dx, Mq, tq, Tq), the
value Ttj is computed with the relationship (3):
(3)
The TBO(Fwr,, Dy, Ms, t, T) of a function with redundancies
is computed from (4):
(4)
where Dy and Ms are the device and its failure mode for
which the RUL or TBO that have the greatest value Tt and t
is the instant at which the TBO has been computed. The
new TBO is recorded and replace the previous stored one.
For a redundancy function, the TBOs and/or RULs of their
entities are considered. For an entity that is a device Di, we
consider its RUL(Di, Mp, tp, Tp) or TBO(Di,, Dx, Mq, tq, Tq)
and the value Tti, that is also computed with the relationship
(2). For a function entity Fj with TBO(Fj,, Dx, Mq, tq, Tq), the
value Ttj is computed with the relationship (3) too. The
TBO(Fr,, Dy, Ms, t, T) of a redundancy function is computed
from (5):
(5)
where Dy and Ms are the device and its failure mode for
which the RUL or TBO that have the nth greatest value Tt
and t is the instant at which the TBO has been computed
(generally n=2). The new TBO is recorded and replace the
previous stored one.
If the TBO of a function Fk has changed and if its linked to a
device by an arc
, which is in fact an arc
, the task 2 is then processed with the same
procedure as the one for arcs
.
The TBOs of the functions and RULs of the devices are the
elements of the prognostic.
4.4. Experimental results
The CSPF was successfully tested. In order to illustrate the
results it provides, we propose the case study of the figure 2
where the arcs represent the structural knowledge and the
boxes the functional knowledge.
First European Conference of the Prognostics and Health Management Society, 2012
6
European Conference of Prognostics and Health Management Society 2012
6
Figure 2. Case study.
In this system, only one mode of failure is considered for
each device and the effect of this failure is supposed to be
the same as the out of order mode. That is why only one
kind of arcs is represented in Figure 2. However, one case of
system with devices having two failure modes with different
effects has also been successfully tested.
The table 1 shows an overview of the results provided by
the CSPF for a simulated scenario. In this table the first
column is the rank of reception of a RUL, the second
column is the identifier of the device for which the
monitoring layer has emitted a RUL, the third column is the
date (which is the sum t+T of the fields contained in the
received RUL) at which the device will probably fail. The
other columns of the table contain the dates (the sums t+T)
of the RULs or TBO of devices and functions that are
modified by the CSPF because of the received RUL. Dates
in red mean that the date (t+T) of RUL’s device is earlier
than the date of its TBO.
The proposed CSPF is processed on-line each time a new
RUL is received but it always leads to reduce the dates (t+T)
of the RULs and TBOs of devices and functions. Thus, it is
a pessimistic approach of the prognostic of the complex
system. In that case, we can consider that the prognostic
made on-line is dedicated to control operators. However, the
CSPF can be run off-line for maintenance operators. The T
values of the TBO must so be set to very great values. Then
the CSPF is then run for all the RULs of each device. Thus,
the maintenance operators have so indications about the
devices that need maintenance actions. Once a device have
been replaced or fixed, its RUL must be set at new values.
In such cases, the T value of the RUL of the replaced or
fixed device may be set to a value equal to its MTBF (Mean
Time Between Failures) or MTTF (Mean Time To Failure).
The T values of the TBO are then set to very great values.
Then the CSPF is then run for all the RULs of each device.
However, The CSPF requires graph traversal and it can so
be a long process. One way to reduce the computation time
is to distribute the CSPF.
5. DISTRIBUTION OF THE CSPF
The proposed distribution of the CSPF consists of several
agents that all process the CSPF. Assuming that there are
few interconnections between subsystems, we propose one
agent by subsystem in order to reduce the number of the
sent messages between the agents. The agents have to be
implemented into different computing platforms to increase
the computation of the CSPF. Thus, the architecture can be
represented as shown in figure 3.
Table 1. Example of results provided by the CSPF
First European Conference of the Prognostics and Health Management Society, 2012
7
European Conference of Prognostics and Health Management Society 2012
7
Figure 3. Distributed architecture scheme.
The SPAs are the Subsystem Prognostic Agent. They
contain a database in which the functional and structural
knowledge are represented as well as the structural
interconnections between the subsystems. In this
architecture the monitoring layer sends the RULs of the
devices to the SPA that prognoses the subsystem to which
the device belongs.
In the proposed distribution of the CSPF, the knowledge is
distributed to the SPAs. The SPAs are generic agents
because they all process the same tasks but their results
depend on the knowledge modeled in their databases. The
prognostic of the complex system is made of the RULs of
the devices and the TBOs of the functions that are recorded
by the SPAs.
Thus the proposed architecture is also quite scalable.
Indeed, adding a device or a new function consists mainly in
adding functional and structural descriptions in the SPA of
the subsystem it belongs to and, perhaps, some arcs for the
structural models of the other SPAs. However, they do not
need to modify the algorithms processed by the SPAs.
Assuming the case study of the figure 2, three SPAs are
implemented.
The parts of knowledge that are modeled in the databases of
the SPAs are described in figure 4 (4.1 for SPA1, 4.2 for
SPA2 and 4.3 for SPA3).
From the structural knowledge, an SPA knows to what SPA
a TBO must be sent thanks to the identifier of the external
devices.
The communication between the SPAs can be modeled
thanks to an UML sequence diagram as shown in figure 5
where the monitoring layer is considered as a single agent
but it could made of several ones. Two SPAs are
represented: the one that receives the RUL and one that
represent the other SPAs. The task 1 is processed only once
when a “New_RUL” message is received by a SPA. The
“Modified_TBO” messages are emitted by the SPA from
task 2 or task 3. Those messages indicate to the SPA that
receives it what device is impacted by the TBO. Thus, when
a SPA receives such a message, it processes the task 2 and
the task 3. Even if it is distributed, the CSPF can be quite
long to execute and “New_RUL” or “Modified_TBO”
messages can be received while a SPA is running. So those
messages have to be stored in a kind of buffer. The t values,
which are fields of RULs and TBOs, can be used to sort the
messages by increasing date.
(4.1)
(4.2)
(4.3)
Figure 4. Modeled knowledge in SPAs .
Figure 5. Sequence diagram.
First European Conference of the Prognostics and Health Management Society, 2012
8
European Conference of Prognostics and Health Management Society 2012
8
6. CONCLUSION
This paper presented a generic algorithm to carry out the
prognosis of complex system from the RULs of its devices.
This approach requires functional and structural knowledge
of the complex systems whose models were given.
Requirements for the functional modeling were detailed. As
the proposed prognostic principle requires graph traversal,
its distribution into generic agents in order to reduce its
computation time was presented. The distribution of the
functional and structural models into the prognostic agents
was proposed. The principle of prognosis provides online
pessimistic results but it can be run off-line for more
optimistic results. So one can consider that online process is
dedicated to control operators (TBOs of functions) and that
off-line process is dedicated to maintenance (TBOs of
functions and RULs of devices).
The distributed simulation platform is under development. It
uses a middleware to bring into operation the
communication between the monitoring layer agents and
SPAs. This platform shall enable to compare the centralized
approach (with one SPA) and the distributed approach with
several SPAs.
Another perspective will consist of the definition of
functional and structural model to assess TBOs of devices
and functions even when RULs of devices are increasing
(when t+T is increasing). Eventually, the problem of
uncertainty of RULs could be addressed for prognosing
complex systems.
REFERENCES
Byington, C., Roemer, M.J., Watson, M., Galie, T. (2003).
Prognostic enhancements to gas turbine diagnostic
systems, Proceedings of IEEE Aerospace Conference,
vol. 7, pp. 3247-3255.
Chittaro, L., Ranon, R. (2003). Hierarchical model-based
diagnosis based on structural abstraction, Artificial
Intelligence, vol. 155, pp. 147–182
Dragomir, O., Gouriveau, R., Zerhouni, N., Dragomir, F.
(2007). Framework for a distributed and hybrid
prognostic system, Proceedings of 4th IFAC
Conference on Management and Control of Production
and Logistics.
Dunjo, J., Fthenakis, V., Vilchez, J.A., Arnaldos, J. (2010).
Hazard and operability (HAZOP) analysis. A literature
review, Journal of Hazardous Materials, vol. 173, pp.
19–32.
Engel, S., Gilmartin, B., Bongort, K., Hess, A. (2000).
Prognostics, the real issues involved with predicting life
remaining, Proceedings of the IEEE Aerospace
Conference, vol. 6, pp. 457-469.
Iung, B., Monnin, M., Voisin, A., Cocheteux, P., Levrat, E.
(2008). Degradation state model-based prognosis for
proactively maintaining product performance, CIRP
Annals - Manufacturing Technology, vol. 57, pp.49–52.
Jardine, A., Lin, D. and Banjevic, D. (2006). A review on
machinery diagnostics and prognostics implementing
condition-based maintenance, Mechanical Systems and
Signal Processing, vol. 20, pp. 1483-1510.
Jennings, N.R., Wooldridge, M. (1995) Applying agent
technology, Applied Artificial Intelligence, vol. 9, pp.
357-369.
Lebold, M., Thurston, M. (2001) Open standards for
condition-based maintenance and prognostics systems,
Proceedings of the 5th annual maintenance and
reliability conference (MARCON 2001).
Muller, A., Crespo Marquez, A., Iung, B. (2008). On the
concept of e-maintenance: Review and current research,
Reliability Engineering and System Safety, vol. 93, pp.
1165–1187.
Reiter, R. (1992). A theory of diagnosis from RST
principles, Readings in model-based diagnosis, Morgan
Kaufmann Publishers, pp. 29-48.
Roemer, M., Byington, C., Kacprzynski, G.J.,
Vachtsevanos, G. (2007). An overview of selected
prognostic technologies with reference to an integrated
PHM architecture. Technical Report, Impakt
Technologies.
Saha, B., Saha, S., Goebel, K. (2009). A distributed
prognostic health management architecture,
Proceedings of the Conference of the Society for
Machinery Failure.
Scarf, P. (2007). A Framework for Condition Monitoring
and Condition Based Maintenance, Quality Technology
& Quantitative Management, vol 4, pp. 301-312.
Takai, S. Kumar, R. (2011). Inference-Based Decentralized
Prognosis in Discrete Event Systems, IEEE
Transactions on Automatic, vol. 56, pp.165-171.
Vachtsevanos, G., Wang, P. (2001). Fault prognosis using
dynamic wavelet neural networks. Proceedings of
AUTOTESTCON IEEE Systems Readiness
Technology Conference, pp. 857-870.
Vachtsevanos, G., Lewis, F. L., Roemer, M., Hess, A., Wu,
B. (2006). Intelligent fault diagnosis and prognosis for
engineering system. Hoboken, NJ: John Wiley & Sons,
Inc.
Voisin, A., Levrat, E., Cocheteux, P., Iung, B. (2010).
Generic prognosis model for proactive maintenance
decision support: application to pre-industrial e-
maintenance test bed, Journal of Intelligent
Manufacturing, vol. 21, pp. 177–193.
Worn, H., Langle, T., Albert, M., Kazi, A., Brighenti, A.,
Revuelta Seijo, S., Senior, C., Sanz-Bobi, M.A., Villar
Collado, J. (2004). Diamond: distributed multi-agent
architecture for monitoring and diagnosis. Production
Planning and Control, vol. 5, pp. 189-200.
First European Conference of the Prognostics and Health Management Society, 2012
9
An Approach to the Health Monitoring of a Pumping Unit in an
Aircraft Engine Fuel System
Benjamin Lamoureux1, 2
, Jean-Rémi Massé3, and Nazih Mechbal
4
1,3Snecma (Safran Group), Systems Division, Villaroche, France
2,4Arts et Métiers ParisTech, PIMM UMR CNRS, Paris, France
ABSTRACT
This paper provides an approach for health monitoring
through an early detection of failure modes premises. It is a
physics-based model approach that captures the knowledge
of the system and its degradations. The component under
study is the pumping unit such as those found in aircraft
engines fuel systems. First, a complete component analysis
is performed to determine its potential degradation and a
physics-based relevant component health indicator (CHI) is
defined. Then, degradations are modelled and their impacts
on the CHI are quantified using an AMESim®
physics-based
model. Assuming that in-flight measures are available, a
model updating is performed and a healthy distribution of
the CHI is computed. Eventually, a fault detection algorithm
is developed and statistical validation is performed through
the computation of key performance indicators (KPI). In
parallel, a degradation severity indicator (DSI) is defined
and prognostic is performed based on the monitoring of this
DSI.
1. INTRODUCTION
In the modern aircraft engines industry, increasing products
availability is of paramount importance. Undeniably, delays
and cancellations caused by unanticipated components
failures generate prohibitive expenses, especially when
failures occur on sites without proper maintenance staff and
equipments. In order to minimize the occurrence of these
unexpected costly flight failures and to extend system
availability, Prognostic Health Monitoring (PHM) system
has become a necessity by performing continuous diagnosis
and capturing the current health state of the component.
A PHM system (Sheppard, Kaufman and Wilmer 2009)
ideally performs fault detection, isolation, diagnostics
(determining the specific fault mode and its severity) and
prognostics (predicting accurately remaining useful life).
Whereas fault detection and diagnostics have been the
subject of considerable emphasis (Isermann 1997,
Basseville 1998, Balaban, et al. 2009), prognostics has
gradually emerged as a research topic that can push the
limits of health management systems, particularly in
aeronautics (Byington, et al. 2004, Orsagh, et al. 2005,
Massé, Lamoureux and Boulet 2011). With the recent
development of smart materials, PHM also relates to the
reliability of structures or Structural Health Monitoring
(SHM) (Mechbal, et al. 2006).
If significant research efforts have been done in the field of
PHM, there still remains a huge gap between academic
research and industrial expectations. On the one hand, the
industrial approach to health monitoring rarely integrates
physics-based models to simulate degradations and perform
validation of fault detection and diagnostics and on the other
hand, the academic approach rarely integrates all the
constraints due to on-board exploitation, such as sampling
frequency and storage limitations, imposed sensors number
and location, or limited computation capabilities. The main
purpose of this paper is to merge numerical model-based
and statistical data-driven approaches to perform fault
detection and prognostics on an actual system in its in-
service functioning environment.
It is evidence that a lot of information could be extracted
from the huge quantity of data recorded during a flight. One
of the aircraft engine subsystems, which in case of failure
may result in significant maintenance cost to an airliner, is
the fuel system. Nevertheless, despite its critical function,
the aircraft engine fuel system or its components are almost
never cited as potential candidates for health monitoring. In
response to this lack, we have conducted a complete study
on this subject. The aim of the present paper is to apply fault
_____________________
Benjamin Lamoureux et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License,
which permits unrestricted use, distribution, and reproduction in any
medium, provided the original author and source are credited.
First European Conference of the Prognostics and Health Management Society, 2012
10
European Conference of Prognostics and Health Management Society 2012
2
detection and prognostic to one of the main component of
the fuel system: the pumping unit. The other novelty of this
work is to use a numerical model to quantify the CHI’s
sensibility to degradations in order to create degraded data
from operating measures.
The remainder of the paper is organized in five sections
following the five main parts of the proposed development
method: health monitoring perimeter definition; data
analysis, system and degradation modeling; simulation
results and statistical validation. An additional part deals
with prognostics issue. We conclude and present future
works in a latter section.
2. HEALTH MONITORING PERIMETER DEFINITION
2.1. Aircraft Engine Fuel System Analysis
To perform health monitoring of the pumping unit, it is
essential to study the whole aircraft engine fuel system
because each component contributes to the pressurization of
the hydraulic circuit.
The system is composed of the following components, as
presented in Figure 1:
The bypass valve regulates the flow entering the fuel
metering valve
The fuel metering valve doses the flow to injectors
The pressurizing valve maintains a constant pressure
drop between and
The switch valve switches between two configurations
of an external system
In the figure above, is the supply pressure provided by
aircraft fuel tank, is the low pressure at the outlet port of
the centrifugal pump and is the high pressure at the
outlet of the gear pump. is the injection pressure
in the combustion chamber.
2.2. Degradation Modes of the Gear Pump
Thanks to expertise, experience feedback and Failure Mode
and Effects Analysis (FMEA), two main degradation modes
were selected.
Definition 1:
For a gear pump, an internal leakage is a leakage
between inlet and outlet of the pump. It's mainly due to
contamination of hydraulic fluid which results in abrasion
of gearings surfaces.
Definition 2:
For a gear pump, an External leakage is a leakage
to the exterior of the pump. It’s mainly due to vibrations and
aging of mechanical parts or joints.
2.3. Component Health Indicator
To monitor the state of the gear pump, a feature extracted
from measures and named Component Health Indicator
(CHI) is defined.
Definition 3:
The CHI is a physical measure (or function of it) that
allows, by monitoring its changes, to inquire about the
health of a component.
In the case of gear pump monitoring, the chosen CHI
corresponds to the rotation speed of the pump at the opening
of the switch valve, named . It corresponds to the
rotation speed for which hydraulic power is high enough to
open the valve. Thus, an increase of could indicate
that the pump is less efficient. The valve opening is
confirmed at fifty percent of the whole stroke. An example
of extraction is given in Figure 2. Centrifugal Pump
PA/C
Min flow
Switch Valve
Plp Php
Bypass Valve
Gear Pump
Fuel Metering Valve
Pinjection
Figure 1: Architecture of an aircraft engine fuel system
Pressurizing Valve
First European Conference of the Prognostics and Health Management Society, 2012
11
European Conference of Prognostics and Health Management Society 2012
3
Figure 2 : extraction of
3. DATA ANALYSIS
In the case where measures are available and assuming that
at the time of their recording, the system was faultless, a
healthy distribution of can be computed. In this
example, the healthy reference distribution comes from
statistical analysis of about 400 start sequences of test
flights as shown in Figure 3.
Figure 3 : Results of CHI Extraction
Then, the histogram of the distribution is given in Figure 4
with its maximum likelihood associated function. Without
loss of efficiency, we assume that the distribution follows a
normal law where is the
mean of the healthy distribution and is its standard
deviation.
Figure 4 : Distribution of Healthy State
4. SYSTEM AND DEGRADATION MODELING
4.1. Gear Pump Behavior Modeling
Some related works on the subject have addressed the issue
of modeling a gear pump and their degradations. For
example, Casoli et al. (Casoli, Andrea and Franzoni 2005)
have proposed a method to model a gear pump with
AMESim® and Fritz and Scott have developed a wear model
(Frith and Scott 1994).
In the developed AMESim® model the pump outlet flow is
then expressed as in (1):
(1)
where is the pump outlet flow, the pump
displacement, the volumetric efficiency and the
pump rotation speed. The expression used to compute the
volumetric efficiency is given in (2), i.e.,
(2)
where is the pressure drop between pump inlet and pump
outlet, is the fluid temperature at the pump inlet and
are empirical constant values.
4.2. Fuel System Modeling
The whole aircraft engine fuel system is modeled from
classic AMESim®
blocks.
The model variables are given in Table 1.
Input Curve of rotation speed versus time
Parameters - Fuel Temperature
- Aircraft Supply pressure
Output
Table 1. Model Variables
4.3. Degradation Modeling
To simulate the influence of all the potential faults, we
introduce them into the AMESim® model. Internal leakage
is modeled by a diaphragm with a variable section between
pump inlet and pump outlet (Figure 5a) and external leakage
is modeled by a diaphragm between pump outlet and
external tank at atmospheric pressure (Figure 5b).
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
1000
2000
Time
Rota
tion S
peed
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.005
0.01
Switch Valve Position
High Pressure Compressor Rotation Speed
HPC Rotation Speed at opening of SwV
50% of the SwV stroke
100% of the SwV stroke
0 50 100 150 200 250 300 350 400 450 500200
300
400
500
600
700
800
900
1000
1100
1200
1300
Index of Flight
Ro
tatio
n S
pe
ed
At
the
Sw
V O
pe
nn
ing
(rp
m)
500 600 700 800 900 1000 11000
2
4
6
8
10
12
14
16
18
20
Rotation Speed at the Opening of SwV
pd
f
Histogram of the CHI
Maximul Likelihood Function
Time (s)
First European Conference of the Prognostics and Health Management Society, 2012
12
European Conference of Prognostics and Health Management Society 2012
4
Figure 5 : AMESim® modeling of a) internal leakage and
b) external leakage
5. SIMULATION RESULTS
The purpose of this part is to determine the sensibility of
to degradations. The behavior of the system is
simulated for nominal and . The function is
approximated by a linear curve to simulate the behavior of
the pump during the start sequence.
5.1.1. Maximal Degradation Intensity
The degradation intensity is defined as the leakage flow
crossing the diaphragm (Figure 5) at 10% of the maximal
rotation speed.
The Maximal Degradation Intensity, named is the
intensity for which the system is in a non functional state. In
this case, the is reached when the pump is not able to
deliver the sufficient flow needed for the start sequence.
is calculated from specification of the minimal pump
outlet flow allowed at 10% of the maximal rotation speed.
The maximal intensity is different for each degradation so
both and
are computed.
5.1.2. Model Updating
The model updating is performed based on the comparison
between and simulated in the nominal
flawless state. To perform the updating, some parameters of
the model, such as the displacement of the pump or the
calibration of the Switch Valve sensor are adjusted.
5.1.3. CHI sensibility results
Degradations of increasing intensities up to are
simulated to quantify the sensibility of Results are
given in Table 2.
Type of
Degradation Intensity of Degradation
Value of
the CHI
Internal Leakage
0 750
Low: 826
Medium: 893
High: 1027
External Leakage
0 750
Low: 770
Medium: 834
High: 981
Table 2. Simulation Results
6. STATISTICAL VALIDATION
Statistical validation is based on the comparison between
the measured distribution of the healthy state and the
estimated distribution of the faulty states given by specific
transformation laws applied to the CHI.
6.1. CHI Transformation Laws
Definition 4:
CHI transformation laws ( ) are functions calculating
the variation of a CHI for a given degradation with a given
intensity. The typical form of a is given in Eq. 3.
(3)
For example, considering and degradation ,
transformation law gives the variation of
, named as a function of the degradation
intensity .
For each degradation, a is defined and can be apply
to a real distribution of CHI as following (4).
(4)
where is the estimated value of the CHI in presence
of and is its healthy value.
The two are computed by applying a linear
regression between degradation intensities and values.
The results are given in Eq. 5 and Eq. 6.
(5)
(6)
with and coefficient of linear regressions.
Figure 6 gives an exemple of how is calculated.
First European Conference of the Prognostics and Health Management Society, 2012
13
European Conference of Prognostics and Health Management Society 2012
5
Figure 6 : Exemple of coefficient S computation
6.2. Application of the CHI Transformation Laws
Once and are computed, can be
applied to the real healthy distribution to construct a
degraded CHI defined by:
(7)
with the constructed degraded value of for
degradation, of intensity and the
measured value of the In the case of internal leakage
flaw, results are given in Figure 7.
Figure 7 : Distribution of CHIs
6.3. Key Performance Indicators
In aeronautics, because of the modular architecture, the
main part of maintenance operations is the troubleshooting,
which consists in finding the faulty Line Replaceable Unit
(LRU) to change it. It means that in the diagnostic process,
only isolation and not identification is of paramount
importance. As the two degradations considered in this
study affect the same unit, only fault detection is addressed.
A complete signal detection theory can be found in
(Wickens 2002).
Definition 5:
A Key Performance Indicator (KPI) is an indicator of the
monitoring system efficiency. Its required value is given by
specifications.
For example, in aeronautics, specifications usually require a
less than 5% false negatives and less than 20% false positive
(Table 3).
KPI Definition
False Positive
(FP) Rate
Proportion of False Positive (false alarms)
among the states for which a fault is
detected (see Figure 8)
False Negative
(FN) Rate
Proportion of False Negative (undetected
faults) among the states for which no fault
is detected (see Figure 8)
Table 3. Key Performance Indicators
6.4. Detection Threshold Selection
Thanks to , degraded data is computed from real
healthy data. Once degraded data is estimated, detection
threshold can be defined.
Typically, the chosen value for is calculated from Eq. 8
where is a positive real.
(8)
The graphical meaning of is given in Figure 8. In this
figure, false negative and positive are represented for
calculated with .
Figure 8 : definition of the coefficient
6.5. Statistical Validation
As explained previously, only fault detection is performed
in this study. For example, the KPI for medium and high
degradation levels of the internal leakage are presented in
Table 4 and associated ROC curves in Figure 9.
Internal Leakage Medium Internal Leakage High
FP FN FP FN
0 50% 2.4% 0 50% 0%
1 16.4% 16.4% 1 16.4% 0.2%
1.5 7.1% 31.5% 1.5 7.1% 0.7%
2 2.4% 50.6% 2 2.4% 2.6%
0 1 2 3 4 5 6 7 8 9700
750
800
850
900
950
1000
1050
1100
Degradation Intensity
CH
I
Evolution of the CHI versus Degradation Intensity
Linear Regression
Slope = S
500 600 700 800 900 1000 1100 1200 13000
2
4
6
8
10
12
14
16
CHI value
pd
f
500 600 700 800 900 1000 1100 1200 13000
1
2
3
4
5
6x 10
-3
CHI value
pd
f
Healthy
Low
Medium
High
0 20 40 60 80 100 120 140 160 180 2000
0.005
0.01
0.015
0.02
0.025
0.03
CHI value
Healthy distribution
Faulty distribution
A = 0
A = 2 A = 4
False Negative
False Positive
First European Conference of the Prognostics and Health Management Society, 2012
14
European Conference of Prognostics and Health Management Society 2012
6
2.5 0.7% 69.6% 2.5 0.7% 7.4%
3 0.1% 84.3% 3 0.1% 17.2%
Table 4. Performance of fault detection
Figure 9 : ROC curves for Low, Medium and High
Intensities
In aeronautics, degradations are detectable if the KPI are
such as false negative rate is under 5% and false positive
rate under 20%.
In conclusion, internal leakages of medium intensity are not
detectable whereas those of high intensity are detectable
with equal to 1, 1.5 or 2. The results for the external
leakage will not be exposed here but they are very similar.
7. PROGNOSTICS
The purpose of the prognostics is to prevent the CHI from
reaching the value.
7.1. Degradation Severity Indicator
A Degradation Severity Indicator (DSI) is an index defined
in order to quantify potential impacts of the degradation on
the system operability. The higher the DSI, the more
degraded the system.
In this paper, DSI for a start sequence N is defined as
following:
(9)
In Eq. 9, refers to the maximal degradation
intensity reachable before complete failure of the system.
7.2. Prognostics
The prognostics method defined in this paper is based on the
trending analysis. The purpose is to estimate the Remaining
Useful Life (RUL) of the pump by anticipating the moment
when will be greater than 1. The RUL is expressed in
flights number. Aeronautics specifications usually require
that degradations should be detected at least 20 flights
before the occurring of the failure.
The method consists in calculating a linear regression on the
past flights at each new flight and to predict the RUL
assuming that the slope remains the same. The value of
the RUL is the estimated number of flights before crossing
of the maximum DSI threshold (Figure 10). When the value
of the RUL is estimated inferior to 20 flights, an alarm is
sent to maintenance operators.
For the time being, there are no defined KPI for prognostics
in aeronautics.
Figure 10: Remaining Useful Life Computation
To validate the method, some gradually degrading measures
over 1500 flights are constructed thanks to .
Depending on the index of the flight , the value of is
given by Eq. 10.
(8)
Where designates a random selection of
a value in the healthy distribution and is the
function giving the intensity of the degradation versus the
flight index. The function is normally derived from
physics of failure but in this application, we suppose that it
is a linear degradation growing from 0 to a maximal
degradation intensity at flight 1000 (Figure 10).
In Figure 11, results of the alarm computation for the RUL
are given and it can be noticed that some alarms occur
around a hundred flights before reaching . It shows
that the method is efficient to anticipate failures by
monitoring the DSI. However, there is still work to be done
to limit false alarms.
Figure 11: Remaining Useful Life Alarms
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Positive Rate
Tru
e P
ositiv
e R
ate
Low
Medium
High
0 500 1000 1500-0.5
0
0.5
1
1.5
2
Flight Index
De
gra
da
tio
n S
eve
rity
In
dic
ato
r (D
SI)
RUL
200 300 400 500 600 700 800 900 1000 1100 12000
100
200
300
400
500
600
700
800
900
1000
Flight Index
Re
ma
inin
g U
se
ful L
ife
No Alarm
Alarm
First European Conference of the Prognostics and Health Management Society, 2012
15
European Conference of Prognostics and Health Management Society 2012
7
CONCLUSION
In conclusion, a method for the health monitoring of a
pumping unit from the definition of physics-based
indicators to the statistical validation of the Key
Performance Indicators has been proposed. The main
novelty of this paper is that after having defined physics-
based Component Health Indicator, their sensibility is tested
on a physic-based model constructed in the AMESim®
environment.
The statistical validation showed that the Component Health
Indicator defined was relevant to detect both internal and
external leakages of a gear pump. A prognostics method
was also addressed based on the Remaining Useful Life
computation and proved to be efficient to anticipate
maximum degradation severity threshold crossing.
For future prospects, the objective is to work on the
improvement of fault detection and prognostics algorithms
and to extend the PHM system to the whole aircraft engine
fuel system. Besides, some KPI must be defined for
prognostics.
NOMENCLATURE
CHI for the gear pump
Outlet flow of the gear pump
Volumetric efficiency of the gear pump
Dummy variables
Temperature at the pump inlet
Rotation Speed of the Pump
Fuel Temperature
Aircraft Supply Pressure
Mean of Healthy Distribution
Standard Deviation of Healthy Distribution
Maximal Intensity of Internal Leakage
Maximal Intensity of External Leakage
REFERENCES
Balaban, E., A. Saxena, P. Bansal, K.F. Goebel, P.
Stoelting, and S. Curran. "A diagnostic approach for electro-
mechanical actuators in aerospace systems." IEEE
Aerospace Conference Proceedings. Big Sky, 2009.
Basseville, Michelle. "On-board component fault detection
and isolation using the statistical local approach."
Automatica vol. 34, 1998: 1391-1415.
Byington, C.S., M. Watson, D. Edwards, and P. Stoelting.
"A model-based approach to prognostics and health
management for flight control actuators." IEEE Aerospace
Conference Proceedings. 2004. 3551-3562.
Casoli, Paolo, Vacca Andrea, and Germano Franzoni. "A
Numerical Model for the Simulation of External Gear
Pumps." Proceedings of the 6th JFPS International
Symposium on Fluid Power. Tsukuba, 2005.
Frith, R.H., and W. Scott. "Wear in external gear pumps : a
simplified model." Wear 172, 1994: 121-126.
Isermann, Rolf. "Supervision, fault-detection and fault-
diagnosis methods - An introduction." Control Engineering
Practice vol.5, 1997: 639-652.
Lamoureux, Benjamin, Jean-Rémi Massé, and Nazih
Mechbal. "A Diagnosis Methodology for the
Hydromechanical Actuation Loops in Aircraft Engines."
Proceedings of the 20th Mediterranean Conference on
Control and Automation (forthcoming). Barcelona, 2012.
Lamoureux, Benjamin, Jean-Rémi Massé, and Nazih
Mechbal. "An approach to the Health Monitoring of the
Fuel System of a Turbofan." Proceedings of IEEE PHM
2012 (forthcoming). Denver, 2012.
Massé, J.R., B. Lamoureux, and X. Boulet. "Prognosis and
Health Management in system design." Proceedings of
IEEE PHM 2011. Denver, 2011.
Mechbal, N., M. Vergé, G. Coffignal, and M. Ganapathi.
"Application of a combined active control and fault
detection scheme to an active composite flexible structure."
Mechatronics vol. 16, 2006: 193-208.
Orsagh, R., D. Brown, M. Roemer, T. Dabney, and A. Hess.
"Prognostic health management for avionics system power
supplies." IEEE Aerospace Conference Proceedings. 2005.
Sheppard, J.W., M.A. Kaufman, and T.J. Wilmer. "IEEE
Standards for Prognostics and Health Management."
Aerospace and Electronic Systems Magazine 24, no. 9
(2009): 34–41.
Wickens, Thomas D. Elementary Signal Detection Theory.
Oxford University Press, 2002.
First European Conference of the Prognostics and Health Management Society, 2012
16
Application of Microwave Sensing to Blade Health Monitoring David Kwapisz1, Michaël Hafner2, Ravi Rajamani3
1,2Meggitt Sensing Systems, Fribourg, Switzerland [email protected] [email protected]
3Meggitt-USA, Inc.
ABSTRACT
This paper discusses the application of microwave sensing to turbine airfoil health monitoring. The proposed microwave system operates at 6- and 24-GHz and is applicable to both blade tip-clearance and blade tip-timing measurements. One of the main advantages of microwave systems, compared to other technology such as capacitive or eddy current, is that it can be installed for long term operations in the harsh environment of the first turbine stages. The monitoring of blade tip-timing and tip-clearance pattern is useful for detecting abnormal blade behavior due to structural damage. Such a sensing system can also be used in actively maintaining optimal blade-to-casing clearance, thereby enhancing turbine efficiency. This paper presents blade tip-clearance pattern monitoring based on microwave measurements. First, a laboratory study shows the ability of the system to consistently measure tip clearance pattern. Then tip clearance pattern measurements from a real engine test are presented. While this paper presents results from system testing on tip clearance, it is expected that this study will be carried forward in the next phase to demonstrate tip-timing measurement and further, to show how such as system can form the basis for a more comprehensive health management system.
1. INTRODUCTION
Both aero gas turbines as well as stationary gas turbines are increasingly deploying blade health monitoring (BHM) systems that assess the health of airfoils by sensing the tip-clearance and the tip-timing of individual blades. BHM systems can estimate different parameters depending on the sophistication of the algorithms. Many academic papers have been written on this topic, but the real value of such technologies in solving the BHM problem is seen by the number of real world systems that have started employing the technology (Flotow, Mercadal & Tappert, 2000; David Kwapisz et.al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Zielinski & Ziller, 2005; Hess, Frith & Suarez, 2006; Hess, 2007; Martin, Forry, Maier & Hansen, 2011).
Blades can fail because of the structural flaws that are caused by manufacturing defects or by impacts from external objects. Tip-clearance and tip-timing measurements can be used to detect these flaws. One relatively straightforward way of doing this is to establish a “baseline” pattern during the initial operation of the turbine and then assessing the “deviation” from this baseline. Because these patterns will change as the turbine loads, the algorithms will need compensation factors built into them, but these can be readily developed. For example, the effect of temperature can be accounted for by a simple additive term, as is shown in the paper. More sophisticated techniques involve the actual modeling of the vibration modes of the airfoil using physics-based techniques and then detecting deviations from expected behavior and basing the diagnostics on this. The former method is easier to implement but is not as powerful as the model-based method.
In either case, the key is to get a reliable and repeatable measurement system that can be depended on to deliver consistent measurement under noisy and harsh conditions.
Additionally, monitoring blade passage can be used to detect incipient damage to the rotor as well as aid in sophisticated clearance control. An SAE Aerospace Information Report, currently in preparation (2012), will provide a good overview of various uses of BHM. A specific sensor can only measure the instantaneous clearance between the airfoils and a specific location on the casing. With multiple sensors located around the circumference, a better estimate of the clearance can be obtained, which can be used for real-time clearance control. Structural failure, especially in the low pressure compressor, can occur due to foreign object damage (FOD). The BHM system can be used to detect FOD as well, possibly in concert with other diagnostic sensors such as accelerometers mounted close to the front of the turbine. Mounting two sensors in roughly the same radial location can help in detecting axial deflections that will allow blade twist to be estimated, again improving the ability to measure different failure mechanisms. Of course, these techniques
First European Conference of the Prognostics and Health Management Society, 2012
17
European Conference of the Prognostics and Health Management Society, 2012
2
come at the price of added system complexity and cost, so it has to be weighed carefully against the benefit.
This paper describes a system based on microwave technology that delivers a highly accurate and consistent measurement. This is demonstrated for blade clearance in this paper via experimental results. In particular, the ability of the sensor to detect blade length variations lower than a few tenths of millimeters is described. The detection of such variation can be used to detect blade fatigue cracks and thus, improve the maintenance scheduling. Combined with its harsh environment survivability, this detection capability offers a real opportunity to design reliable BHM systems (Woike, Abdul-Aziz & Bencic, 2010).
Section 2 gives a general description of the microwave sensor including its operating principle and its application to blade anomaly detection. BHM performance depends critically on the accuracy and consistency of the measurement system. This is described in Section 3. Finally, Section 4 describes experimental results from a test on an industrial gas turbine. This shows that the microwave sensor is capable of accurate and consistent measurement of tip–clearance pattern.
2. PRESENTATION OF THE MICROWAVE SENSOR
2.1. Microwave measurement principle
The microwave blade monitoring system presented here is based on a phase measurement principle. A continuous-wave microwave signal is generated in a microwave signal conditioning unit and transmitted through a coaxial cable to the probe (Figure 1). The probe is an antenna capable of transmitting the continuous-wave into the space between the casing and the bladed rotor. The probe also acts as a receiver and captures the emitted wave that is reflected back by the blade tip, which is measured by the electronics.
Figure 1. System architecture.
The microwave electronics generates a continuous-wave which is transmitted to the probe through a circulator. Then, the wave is reflected by the blade tips back to the circulator and to two RF mixers arranged in vector architecture. The in-phase and quadrature components of the reflected wave are extracted and digitalized before processing and phase calculation.
A vector mixer architecture is used to compare the received signal to an internal reference and to reconstruct a phase measurement. Basically, the phase between the transmitted and the reflected signals is proportional to the distance between the probe and the blade tips. The conversion between the measured phase φ and the associated clearance δ is given by Eq. (1) and depends on the wavelength λ of the microwave signal. Compared to other technologies, this relationship is linear and much easier to calibrate via sensitivity and offset corrections.
λπ
ϕδ ⋅=4
(1)
2.2. Probe and engine installation
The microwave probe is basically an antenna optimized to transmit at a defined frequency with a given bandwidth. This antenna is packaged into a hermetic sealed body with an integral mineral cable on its back. This probe construction is made with materials chosen for their high temperature survivability coupled with reliable long term operation in the harsh environment of gas turbines.
Two versions of the microwave system have been developed. The first one uses a frequency in the 6 GHz band and has a measurement range of 25 mm and a probe diameter of 14 mm (Figure 2). It is suited for large frame gas turbines. The second one uses a frequency in the 24GHz band for a measurement range of 6 mm and a probe diameter of 8.5 mm. It is preferably used with small blades from aviation or aero-derivative gas turbines.
Figure 2. Picture of the 6 GHz microwave probe.
The probe installation requires an opening through the casing such that the probe tip has a direct view of the rotor and its blade tips. A ceramic window on the probe tip allows the microwave signal to transmit to the blade. A retaining ring ensures that this ceramic window does not fall into the gas path. Depending on the engine construction, the integration is more or less complex. Normally, the
I
Q
90° Phase
Shifter
Blade Pass
PLL
Probe_
_Casing
Clearance
Cable_
ARG φ
REF
First European Conference of the Prognostics and Health Management Society, 2012
18
European Conference of the Prognostics and Health Management Society, 2012
3
installation in the turbine section has more constraints due to the high temperatures and to ensure proper sealing between several casing layers, which can move relatively, one to the other. Figure 3 shows an example of probe mounting in the turbine section of an aero engine with two casing layers. The probe tip is usually installed flush or recessed from the casing inner surface to ensure no contact with the blades even during a rub event.
Figure 3. Probe mounting on the turbine.
The important parameter for the probe installation is the position of the probe relative to the blade tips as the probe measures what is directly underneath it. It is not always possible to install the probe at a desired location due to piping or mechanical constraints. Typically, the rotor moves axially due to thermal dilatation and aerodynamic forces and therefore the blade tips move in the axial direction with respect to the fix casing. This is normally known and taken into account during probe mounting.
2.3. Microwave tip clearance measurement output
The microwave blade tip clearance system presented in this paper does not provide continuous blade profile waveform output proportional to the measured distance like a laser, eddy current or capacitance probes. The amount of data becomes quickly important in the case of continuous blade profile measurement and has to be reduced to be exploitable for blade health monitoring. Therefore, data reduction is used by the mean of an algorithm detecting each individual blade within the microwave measurement and extracting only one tip clearance value for each blade. These calculated tip clearance values correspond to the minimum distance between the individual blade tips and the probe. The sensor then provides a digital array with one tip clearance measurement 𝛿𝑖 per blade. The array of tip clearance measurements over a full rotor revolution is called the blade clearance pattern.
2.4. The centered blade clearance pattern and its application to health monitoring
The monitoring of the individual blade tip clearances 𝛿𝑖 provides useful information on abnormal blade elongation due to cracks and thus, can be used for health monitoring purposes. Nevertheless, abnormal blade elongation must be differentiated from normal elongations due to temperature or centrifugal forces Eq. (2).
𝛿𝑖 = 𝛿𝑖𝑐 + δtemperature + δcentrifugal (2)
The main hypothesis that can be done on abnormal elongation is that it affects only one particular blade while global elongations affect all the blades of the rotor. Given a relative high number of blades, the mean clearance should not be affected by the abnormal elongation of one particular blade. In this case, the centered blade clearance pattern 𝛿1𝑐 , … , 𝛿𝑖𝑐 , … , 𝛿𝑁𝑐 defined by Eq. (3) can be used as baseline. Any deviation from this baseline can be used as metric for blade crack detection (Figure 4).
𝛿𝑖𝑐 = 𝛿𝑖 −1𝑁∑ 𝛿𝑗𝑁𝑗=1 (3)
Figure 4. Principle of crack detection based on clearance pattern monitoring. If the centered clearance of one
particular blade passes above a given threshold, an alarm can be generated.
Such a detection strategy requires that an abnormal blade elongation can be detected without any ambiguity due to measurement errors. Therefore, it is necessary to validate that the clearance pattern can be consistently measured by the sensor system. This point is discussed in the next section.
3. LABORATORY EVALUATION OF CLEARANCE PATTERN MEASUREMENT
3.1. Problem Description
Blade elongations due to mechanical cracks are about a few tenths of a millimeter (Dyke, 2011). In order to be able to detect these abnormal elongations, the measurement uncertainty on tip-clearance has to be lower than the elongation itself. This elongation has to be consistently differentiable over the different engine conditions. The purpose of this laboratory evaluation is to validate that the microwave system can detect blade elongations of a few tenths of a millimeter. For that, two types of test campaign
First European Conference of the Prognostics and Health Management Society, 2012
19
European Conference of the Prognostics and Health Management Society, 2012
4
are realized. The first one is done on a precision test setup which can accurately position the blades relative to the probe. For this test, five blade mockups are mounted and the clearance pattern is characterized at different nominal clearance. The goal is to validate the consistency of the tip-clearance pattern measurements over a given range of nominal clearance. The second type of validation is realized on a spinning setup with forty blade mockups mounted on a rotor. The goal is to characterize precision and accuracy of clearance pattern measurement with a test bench representative of an engine. Both test campaigns use the 24GHz version of the microwave system and a laser sensor for reference measurements.
3.2. Reference measurements with a laser sensor
In order to correctly assess the clearance pattern measurement realized with the microwave system, a laser sensor is used for reference measurements. The five blade-mockups are mounted on the precision test setup – as described in Kwapisz, Hafner, and Queloz (2010) – and then scanned by a laser sensor (Figure 5).
The blade clearance profile measured by the laser sensor is given in Figure 6.
Figure 6. Actual tip clearance profile and associated pattern
measured with the laser sensor. Longest blade is about 250µm ahead the other ones.
To be compared with microwave measurements, the clearance pattern is directly extracted from the profile by
taking the median value of each blade tip profile. The individual blade clearances are within 60µm except for a longest blade which is 250µm ahead.
3.3. Measurement with the microwave sensor
The microwave sensor is installed on the precision test setup with the probe oriented toward the blade tip (Figure 7). The nominal clearance between the probe tip and the blade tips is set to 1mm and the blade tip clearance pattern measured by the microwave system.
Figure 7. The 24GHz probe installed in front of the blades.
In order to compare the measurement made by the microwave system with the reference measurement made with the laser system, both measurements are made without any dismounting. Therefore, it is possible to make a direct comparison between both systems.
This first measurement shows that the microwave system correctly measured the clearance pattern with small errors of about 50μm maximum (Figure 8). The longest blade is clearly differentiable. This result has been obtained for a nominal clearance of 1mm and has to be confirmed for the entire clearance range of variation, purpose of the next section.
Figure 8. Direct comparison between laser measurement
and microwave measurement.
3.4. Consistency of pattern measurement over the clearance range of variation
Blade tip clearance measurement is difficult for the different technologies in competition: capacitive, inductive or
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
-0.2
-0.1
0
0.1
0.2
Time
Cle
aran
ce p
atte
rn (
mm
)
Blade ProfileClearance Pattern
0 1 2 3 4 5 6
-0.2
-0.1
0
0.1
0.2
Time
Cle
aran
ce p
atte
rn (
mm
)
LaserMicrowave
Figure 5. The exact blade profile is measured by a laser sensor with an accuracy of 6µm.
First European Conference of the Prognostics and Health Management Society, 2012
20
European Conference of the Prognostics and Health Management Society, 2012
5
microwave sensors. The main reason is that the sensor behavior greatly depends on the nominal sensing distance because of the non-linearity of the physical laws that are involved. In the case of microwave sensing, clearance measurement is based on phase measurement with a linear relationship between phase and clearance (Eq. 1). Therefore, the calibration of such sensor is relatively easy and consists only on sensitivity and offset correction. Nevertheless, the beam width is relatively large and spatial filtering effects can generates measurement errors (Holst, 2005). This is why the correctness of clearance pattern measurement has to be validated over the full clearance range. This validation is the purpose of this section.
Regarding the blade geometry, which corresponds to aero-derivative turbine, the clearance does not likely exceed 3mm. For safety purposes, the minimum clearance that can be set on the test bench is 1mm. Therefore, a set of clearance pattern measurement is performed with a nominal clearance that varies from 1mm to 3mm by step of 0.05mm. Figure 9 shows the clearance response of the five individual blades. The longest one consistently gives a shorter clearance over the full clearance range. The longest blade gives a consistent offset of about 250µm over the full nominal range. This result shows that measurement uncertainties are small enough to enable the differentiation of blade elongation of a few tenths of a millimeter.
Figure 9. The clearances measured by the microwave system with respect to the nominal one.
Typically, a strategy of blade health monitoring, based on blade elongation measurement, requires that the measurement uncertainties be lower than the elongation to detect. To characterize the measurement uncertainties, the blade clearance pattern is computed from each set of measurements between 1mm and 3mm and then compared to each other. Thereby, the uncertainty range on pattern measurement can be estimated as described by Figure 10. It shows that the measurement variability (between the first and the last deciles) on pattern measurement is about
±40μm. The first and last deciles are represented by the boxes. The median is represented by the line inside the box. The minimal and maximal values are represented by the lines. This results has been obtained without any averaging and thus, takes into account both precision and accuracy aspects. They are also consistent with the reference measurements and enable the differentiation of the longest blade over the entire clearance range.
Figure 10. Variability range of blade clearance pattern
measured between 1 and 3mm.
3.5. Measurement on a spinning setup
The validation of clearance pattern measurement is difficult to realize on a real engine in operation because of the lack of reference sensors that can survive the associated harsh conditions. In this case, the obtained measurements are difficult to interpret in term of accuracy and precision because the actual clearance pattern variations due to vibration and thermal expansion are unknown. That is why; a set of laboratory measurements has been realized on a spinning test bench with a reference laser sensor (Figure 11). It enables the direct comparison of microwave measurement with a reference sensor and helps the characterization of measurement performance.
The spinning test bench is based on a 500mm diameter rotor. Forty blades are mounted on the rotor and the tip-clearance of each individual blade can be tuned by using a dedicated sliding mechanical fixture. In order to evaluate the performance of the microwave sensor, the blades are set in order to get a rich clearance pattern. This pattern has been measured by a laser sensor for further comparison with the microwave system (Figure 12). This measurement was very stable with a standard-deviation lower than 10μm which indicates the absence of undesirable vibrations.
1 1.5 2 2.5 30.5
1
1.5
2
2.5
3
Nominal clearance (mm)
Mea
sure
d cl
eara
nce
(mm
)
Blade n°1Blade n°2Blade n°3Blade n°4Blade n°5
0 1 2 3 4 5 6
-0.2
-0.1
0
0.1
0.2
Blade indices
Cle
aran
ce p
atte
rn (m
m)
MicrowaveLaser
First European Conference of the Prognostics and Health Management Society, 2012
21
European Conference of the Prognostics and Health Management Society, 2012
6
Figure 11. Joint measurement of the clearance pattern with the microwave and laser sensors on a spinning test bench.
The blade tip-clearance pattern has been measured by the microwave system over five hundred revolutions. Figure 12 shows the obtained results in term of median, extrema, first and last deciles. Basically, the clearance pattern measured by the microwave system accurately fits with the reference laser measurements. The precision of the microwave system obtained during this test corresponds to an uncertainty range of ±50μm (between the first and the last deciles). It is consistent with the precision of ±40μm obtained with the precision test setup (Section 3.4). This precision can be greatly improved by filtering the output of the sensor and by taking into account the tradeoff between precision and measurement bandwidth.
Figure 12: Measurement of the blade clearance pattern by
the microwave system.
The correlation graph of Figure 13 is obtained by plotting the blade pattern measured by the laser sensor versus the averaged pattern found by the microwave system. The obtained correlation coefficient is higher than 0.99 which validates the linearity of the microwave measurement. The residual deviation is about 17μm. This result demonstrates that microwave measurement can be used to reliably detect clearance variation lower than 100μm.
Figure 13: Correlation graph between microwave measurement and laser measurement. Each data point
corresponds to one particular blade.
3.6. Conclusion on laboratory study
The ability of the microwave sensor to adequately measure tip-clearance patterns has been evaluated through different laboratory tests. The first one, realized on a precision test setup, has showed that the tip-clearance pattern can be consistently measured over the clearance range 1-3mm with a measurement variability of ±40μm. A second test has been realized with a spinning test bench and the obtained results are consistent with a measurement variability of ±50μm. This measurement uncertainty mainly comes from electronics noise and depends on system configuration. For example, it could be improved by using higher performance cables or by applying filtering strategies to the sensor output. On the other hand, the accuracy on pattern measurement is very good with residual errors about 17μm on the measurement realized with the spinning test bench.
4. BLADE PATTERN MEASUREMENT REALIZED ON A REAL ENGINE
4.1. Presentation of data
The microwave sensor has been evaluated through an engine test performed in 2011 on a 25MW turbine (Kwapisz, Hafner, Spitsyn, Mykhaylov & Berezhnoy, 2011). The purpose of this test was the validation of tip clearance measurement but an additional objective was to evaluate the ability of the system to measure the tip-clearance pattern. For that purpose, ten blades of the rotor had been shortened by a few tenths of a millimeter. The measurement of the blade clearance pattern has been performed during different engine operating states. This section presents an analysis of measurement variability related to the clearance pattern measurement with real engine data.
0 5 10 15 20 25 30 35 40-0.5
0
0.5
1
Blade indices
Cle
aran
ce p
atte
rn (m
m)
MicrowaveLaser
-0.5 0 0.5 1-0.5
0
0.5
1
Laser Measurement (mm)
Mic
row
ave
Mea
sure
men
t (m
m)
First European Conference of the Prognostics and Health Management Society, 2012
22
European Conference of the Prognostics and Health Management Society, 2012
7
4.2. Raw measurement analysis
In order to compare the measurement performance obtained during the engine test with laboratory measurement, the data are analyzed without any filtering (Figure 14). Basically, the ten shorter blades can be easily differentiated during the different engine operating states. In term of precision and variability, the measurements are consistent with the laboratory study and show a variability of ±60μm. Nevertheless, the precision can be improved by filtering the clearance measurement outputs as described in the next section.
Figure 14: Variability range of blade clearance pattern computed over the whole engine test represented by
extrema, first and last deciles and median.
4.3. Filtered measurement analysis
Detection of abnormal blade elongation requires a clearance pattern measurement with uncertainties lower than the elongation to detect. The measurement noise can be reduced by adequate filtering but there is a tradeoff between noise reduction and detection bandwidth. During the engine test, the measurement rate was 0.5 Hz. In order to reduce noise, a median filter with a window size of 20 samples is applied to the raw clearance measurements. In this configuration, the filtering leads to a minimal detection delay of 40s. Nevertheless, during this engine test, the measurement rate was not optimized and can be greatly improved for blade health monitoring applications.
The blade pattern has been computed for the whole engine test after having filtered the clearance outputs. The variation ranges are computed and shown by Figure 15. It is interesting to note that the variability of the blade tip-clearance pattern was not uniformly improved by the filtering. Some blades present a very small variability, lower than ±20μm, while other blades still have a variability of ±60μm. Indeed, the pattern variation comprises both measurement uncertainties and actual blade length variations. During this particular engine test, rotor speed and output power were not constant and the blades were subject to different temperature and load constraints. Because of small structural or mounting differences of the blades, the blade clearance pattern is not necessarily constant over all engine conditions. Typically, the BHM
system has to detect abnormal variation, due to crack, among these normal variations.
Figure 15: Variability range of blade clearance pattern computed over the whole engine test represented by
extrema, first and last deciles and median.
4.4. Conclusion on engine test data
The purpose of the engine test was not the validation of blade crack detection based on microwave sensing. The primary goal was the validation of the absolute clearance measurement. Nevertheless, a side results was the measurement of the blade clearance pattern over different engine conditions. Basically, it shows that the variation of the blade clearance pattern is about ±60μm without any filtering. This range comprises both measurement uncertainties and actual blade elongation. It has been improved by filtering and the obtain results show residual variability which are likely due to actual clearance variations. Basically, in order to assess the feasibility of blade health monitoring based on microwave measurement, three metrics are important. The first one is the blade elongation threshold that indicates a crack long enough that turbine control action needs to be taken. The second one is the normal blade elongation discrepancy from which a crack effect has to be differentiated. The last one is the measurement performance of the sensing system that monitors the blades. This last metric has been evaluated in this paper but the feasibility analysis of such BHM system requires additional knowledge on blade elongation.
5. CONCLUSION
Blade health monitoring offers real opportunities to improve gas turbine operation and to reduce maintenance costs. Different strategies and system architectures can be envisaged but one of the keys points is to obtain reliable and accurate sensor package. Due to the harsh environment in the hot section, only a few sensing technologies are capable of blade monitoring in this area. In this domain, the microwave sensor has real advantages as it is capable of accurate temperature measurements while withstanding temperatures near the turbine inlet. This paper has described the tracking of the blade clearance pattern as one way of using this technology for blade health monitoring. This
0 10 20 30 40 50 60 70 80
0
0.5
1
Blade indices
Cle
aran
ce p
atte
rn (m
m)
0 10 20 30 40 50 60 70 80
0
0.5
1
Blade indices
Cle
aran
ce p
atte
rn (m
m)
First European Conference of the Prognostics and Health Management Society, 2012
23
European Conference of the Prognostics and Health Management Society, 2012
8
paper shows how to deal with variability that comes from measurement errors but also from real blade elongation discrepancies. This last point is very important and leads to physics-based diagnostic techniques. In addition to blade clearance measurement, the microwave system is capable of time-of-arrival measurements. This type of measurement is currently under evaluation and will certainly provide rich information for blades health monitoring. In conclusion, the microwave sensor provides a sound basis for future diagnostic systems in term of measurement performance and sensor operability.
REFERENCES
Dyke, J. (2011). Modeling behaviour of damaged turbine blades for engine health diagnostics and prognostics. Master thesis, University of Ottawa, Ottawa, Canada.
Flotow, A., Mercadal, M., & Tappert, P. (2000). Health monitoring and prognostics of blades and disks with blade tip sensors. Aerospace Conference Proceedings, IEEE, Mar 18-25, Big Sky, MT, USA.
Hess, A., Frith, P., & Suarez E. (2006). Challenges, issues, and lessons learned implementing prognostics for propulsion systems. Proceedings of ASME Turbo Expo 2006, May 8-11, Barcelona, Spain.
Hess, A. (2007). Prognostics and health management: The cornerstone of autonomic logistics. (Downloaded from http://www.acq.osd.mil/log/mpp/senior_steering/condition/Hess%20PHM%20Brief.ppt)
Holst, T. A. (2005). Analysis of spatial filtering in phase-based microwave measurements of turbine blade tips” Master’s thesis, Georgia Institute of Technology, Atlanta, GA, USA.
Kwapisz, D., Hafner, M., & Queloz, S. (2010). Calibration and characterization of a CW radar for blade tip clearance measurement. Proceedings of the 7th Euro-pean Radar Conference, September 30 - October 1, Paris, France.
Kwapisz, D., Hafner, M., Spitsyn, V., Mykhaylov, A., Berezhnoy, V. (2011). Test and validation of a microwave tip clearance sensor on a 25MW gas turbine engine. Proceedings of the XVI International Congress of Propulsion Engineering, September 14-19, Rybache, Ukraine.
Martin R., Forry, D., Maier, S., & Hansen, C. (2011). GE’s Next 7FA Gas Turbine “Test and Validation” (Downloaded from http://www.ge-energy.com/content/ multimedia/_files/downloads/GEA18457A_7FA_GI_7-27-11_r1.pdf)
SAE (2012). Airfoil diagnostics with blade tip sensors for operating turbomachinery, SAE Aerospace Information Report, AIR5136, Sep 2012.
Woike, M. R., Abdul-Aziz, A., Bencic, T. J. (2010). A microwave blade tip clearance sensor for propulsion health monitoring, AIAA-2010-3308. (Downloaded from http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa. gov/20100025863_2010028113.pdf)
Zielinski, M., Ziller, G. (2005). Noncontact blade vibration measurement system for aero engine application; International Symposium of Air Breathing Engines, September 4-9, Munich, Germany.
BIOGRAPHIES
David Kwapisz is research engineer at Meggitt Sensing Systems since 2008. He is responsible for technology and testing aspect of microwave sensing. He received a M.Sc. degree in 2005 from Ecole Supérieure des Techniques Aéronautiques et de Construction Automobile (Paris) and a Ph.D degree in Automatic Control in 2008 from Université de Limoges.
Michaël Hafner is product manager at Meggitt Sensing Systems since 2010. He is in charge of the microwave tip clearance and tip timing products for the Energy market. He received a Mechatronics M.Sc. degree in 2006 from the Swiss Federal Institute of Technology.
Ravi Rajamani joined Meggitt PLC in 2011 as an Engineering Director, responsible, in part, for Integrated Vehicle Health Management (IVHM) strategy. Ravi has a BTech from IIT Delhi, an MS from IISc, Bangalore, and a PhD (EE) from the University of Minnesota. Before his current position, Ravi worked at General Electric and at United Technologies primarily in the area of gas turbine controls and diagnostics. He is active within SAE’s Engine Health Management (E-32) and Integrated Vehicle Health Management (HM-1) committees.
First European Conference of the Prognostics and Health Management Society, 2012
24
Assessment of Remaining Useful Life of Power Plant Steam Generators – a Standardized Industrial Application
Ulrich Kunze1 and Stefan Raab2
1,2Siemens AG – Energy Sector, Erlangen, 91050, Germany [email protected] [email protected]
ABSTRACT
The Web based condition monitoring and diagnostic system “Boiler Fatigue Monitoring” enables on-line assessment of cumulative boiler creep and low cycle fatigue according to the European standard EN 12952-3/4 issued in 2001.
The application is employed as autonomous module as well as fully integrated into the Siemens process control system SPPA-T3000 and is becoming more and more a standard part of the power plant instrumentation and control (I&C).
The Fatigue Monitoring System (FMS) is a standard industrial application for both, new built power plants and retrofits of existing units of any kind. The system is not limited to Siemens I&C systems, it is possible to integrate FMS also into power plants with I&C systems of other suppliers. FMS is also capable for calculating the remaining lifetime for boilers designed according to the American standard ASME VIII-2.
1. INTRODUCTION
Steam generator lifetime monitoring and assessment of the remaining useful life (RUL) is a standard application in power plants. Since market requirements changed towards increased flexibility, power plants are operated more cyclic compared to the past. This makes fatigue monitoring systems even more important to immediately get known to impacts of start-ups or shut-downs onto the remaining lifetime of the boiler components.
Therefore, today fatigue monitoring systems for boilers are becoming more and more standard installations in new power plants and are often subject of upgrading activities.
The basis for the assessment of the RUL is the European standard EN 12952-3/4 issued in 2001, which contains simplified rules to calculate creep and low cycle fatigue.
These simplified rules are conservative but have the advantage to be easy to use.
Only 3 years after the release of EN 12952-3/4 the first boiler fatigue monitoring system (FMS) was installed for continuous operation in a new combined cycle power plant in Germany.
FMS is a module of web4diagnostics, the Web-based diagnostics system for power plants. FMS is used in both power plants as an on-line diagnostics system – it has since been installed around the world in more than 50 power plant units, in part with integration in the power plant's office network – as well as in the Siemens Intranet as a Web-based data archive and as tool for data analysis and evaluation.
Previously, this system was a supplement to the operational instrumentation and control (I&C), where the I&C data acquisition was used through a link to the I&C – but otherwise without any further connections. This has changed dramatically.
Today, the "boiler fatigue monitoring" module is fully integrated into the SPPA-T3000 process I&C system and is a standard industrial application. However, it can be combined with any other I&C system via OPC data connection.
2. FUNDAMENTAL PRINCIPLES OF BOILER FATIGUE MONITORING
Many highly-loaded components of the water and steam piping systems with limited service life are implemented in power plant boiler construction. In particular, these are the feedwater heater, superheater, attemperators, headers, piping and internal boiler lines.
The theoretical service life of a component is precalculated for a specific design loading. Operating conditions outside of the design conditions can result in premature failure of the component.
_____________________ Ulrich Kunze and Stefan Raab: This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
The actual anticipated time until failure of the component at the current operation time is known as remaining useful life
First European Conference of the Prognostics and Health Management Society, 2012
25
European Conference of Prognostics and Health Management Society 2012
(RUL). The sum of prior operating time and RUL may be greater or less than the theoretical service life due to past operating conditions outside of the design conditions. The residual life is calculated as the difference between theoretical service life and (material) fatigue.
Fatigue results from
• Creep fatigue and/or • Low-cycle fatigue.
2.1. Creep Fatigue
Creep fatigue designates the fatigue of a component as a consequence of creep damage. Creep damage always occurs when the component is operated above the grain recovery temperature characterizing the material. Creep fatigue results at the most heavily loaded area of the component, generally the area of a cutout. Peak stresses occur here which can result in plastic deformation of the material. The allowable service life is dependent on the component temperature such that the service life is limited at a constant load and decreases with increasing temperature.
2.2. Low-Cycle Fatigue
Low-cycle fatigue is the fatigue of a component as a result of cyclic strain loading. Cyclic strain loading occurs when the part is subjected to pressure changes and/or fluctuating fluid temperature distributions. Thermal stresses resulting from locally transient temperature distributions are superimposed on the compressive loads. Each cycle in the resulting stress (load cycle) leads to utilization of the low-cycle fatigue resistance (low-cycle fatigue) and thus finally to stress cracking at the most highly-loaded point.
3. CODES AND REGULATIONS
Since 2001, EN 12952 has applied for the design and monitoring of boilers in Germany and many other (European) countries (for design: Part 3 and for continuous monitoring: Part 4; cf. Fig. 1).
EN 12952 supersedes the Technical Rules for Steam Boilers (TRD) which served for many years as the basis for design and monitoring, but which are also closely related.
EN12952: Water-tube boilers and auxiliary installations
Fig. 1 Fatigue calculation in accordance with EN 12952
4. DESIGN LOADING FOR STEAM BOILERS
During the design of a steam boiler, it is checked whether the selected design including the intended materials will withstand the loading warranted by the manufacturer.
This verification is performed by the boiler manufacturer.
The manufacturer assumes a service loading combination for subsequent operation for this purpose, comprising, for example, the following typical parameters:
• Service life: 25 years or 200,000 h • Cold starts (120-h outage): 50 • Warm starts (weekend outage): 1250
Creep fatigue
EN12952-4 Appendix A
EN12952 Part 4: In-service boiler life expectancy calculations
EN12952 Part 3: Design and calculation for pressure parts
Low-cycle fatigue
EN12952-4 Appendix B
2
First European Conference of the Prognostics and Health Management Society, 2012
26
European Conference of Prognostics and Health Management Society 2012
• Hot starts (overnight outage): 5000 as well as further possible operating cases.
The anticipated fatigue is calculated for the critical areas (components) of the steam boiler based on EN 12952-3, which results accounting for the design service loading combination. This must always be less than 100%. The boiler manufacturer will generally design the boiler so that there is some reserve with regard to the design service loading combination.
However, these design conditions will be deviated from during operation. It is frequently the case that the power plant is initially in base load operation due to its favorable efficiency compared with the other available power plants. With increasing age, it will be deployed more and more in cycling duty or as a peaking plant.
This different operating mode compared with the design of course results in a different anticipated service life of the boiler – for which reason the boiler must be continuously monitored.
According to the service loading combination, which includes service time as well as starts and load changes, the actual fatigue is shown in percent and not in hours. This applies also for RUL.
5. CALCULATION METHOD
5.1. Creep Fatigue
Calculation of creep fatigue DC is based on a comparison of the exposure time Top of a component at specific levels of pressure and temperature with the theoretical service life Tal of the component at these conditions:
∑∑=i k kial
kiopc T
TD
,,
,,
The theoretical service life is calculated from the creep resistance (material property), the operating temperature and the membrane stress (or pressure).
The procedure is as follows: From inside pressure the circumferential stress for the inner surface of the most loaded nozzle bore is calculated. This stress is compared with the temperature dependent stress-rupture strength given for 10,000, 100,000 and 200,000 h and taking into account a 20% safety margin. The result is the theoretical service life for the inside pressure and temperature value. (see EN 12952-4 for details of calculation ).
Fig. 2 Determination of creep fatigue from exposure time for example of class 560...570°C / 115...120 bar (theoretical service life in class: 4,060,000 h)
3
First European Conference of the Prognostics and Health Management Society, 2012
27
European Conference of Prognostics and Health Management Society 2012
For a later quick overview of the operating mode of the power plant, it is expedient to categorize pressure and temperature in classes – to perform the fatigue calculation in classes. The classification is given by experts. The background is to define small intervals for normal operation values and wider intervals for low temperature and pressure values.
It can then be easily seen during the analysis how long the component has been operated within specific temperature/pressure ranges (see Fig. 2).
5.2. Low-Cycle Fatigue
Low-cycle fatigue DF is determined by counting the number of load cycles n and comparing these with the number of cycles to crack initiation N of the component for specific values of the stress range 2f and temperature t on which the load cycle is based:
∑∑=i k ki
kiF N
nD
,
,
A load cycle is defined by EN 12952-4 as a closed hysteresis loop in the stress/strain diagram. The stress in the material is calculated from the pressure and temperature gradient, while the numbers of cycles to crack initiation are material properties.
To simplify future analysis, it is expedient to categorize the stress range and temperature in classes. This makes it easy to assign load cycles to specific operating modes.
Stresses (including extremes) which cannot yet be combined in load cycles are maintained on the "list of residual extremes" until a "partner" is found for them.
5.3. Total Fatigue
The total fatigue of a component is determined as the sum of:
• Creep fatigue, • Low-cycle fatigue, • Fatigue from the current list of remaining extremes, • Fatigue from prior history of the component and • Correction of fatigue. Fatigue from the current list of remaining extremes is an estimate of the fatigue component of the stress values which do not yet represent load cycles.
6. CONTINUOUS FATIGUE MONITORING
Continuous fatigue monitoring yields information on the actual service life utilization based on the actual (measured) design of the boiler components and the current operating
mode. In practical terms, this constitutes verification of the design analysis.
Continuous fatigue monitoring is the responsibility of the power plant operator (not the manufacturer) and shall be performed for the most highly loaded components.
The manufacturer, the subsequent power plant operators and the licensing authority define jointly, which components of the boiler should be continuously monitored, usually in the construction phase of the power plant.
6.1. Monitored Components
Heavily loaded components which are continuously monitored with regard to creep fatigue and low-cycle fatigue are as follows:
• Headers • Drums • Separators • Spray attemperators • Piping (pipe bends) Drums and separators are generally only monitored for low-cycle fatigue (not creep fatigue), as these components are operated in temperature ranges for which no creep of the material occurs (below the grain recovery temperature).
6.2. Requisite Measuring Points
Operating parameters (measured values) are required for each component to be monitored for calculation of the service life:
• Creep fatigue: - Mean wall temperature tmw and - Internal pressure p
• Low-cycle fatigue: - Inner wall temperature tim - Mean wall temperature tmw - Internal pressure p
As a general rule, the drums and headers to be monitored are already equipped with temperature measurements in the component wall by the manufacturer.
If measurement of the inner wall temperature and the mean wall temperature is not possible, these temperatures can be calculated from the time behavior of the medium temperature.
6.3. Preparatory Calculations
Before the start of the on-line calculation, all of the parameters to be determined once are specified or determined. These are as follows:
4
First European Conference of the Prognostics and Health Management Society, 2012
28
European Conference of Prognostics and Health Management Society 2012
5
The creep, low-cycle or resulting total fatigue are always recalculated for each data acquisition interval, so that the residual service life of a component is always up to date.
• Specification of classification for creep and low-cycle fatigue calculation
• Calculation of theoretical service life for each pressure/temperature class
7. INTERNAL STRUCTURE OF CONTINUOUS FATIGUE MONITORING • Calculation of numbers of cycles to crack initiation for
each stress range/temperature class The boiler fatigue monitoring module FMS (fatigue monitoring system) was developed for calculation of the creep fatigue and low-cycle fatigue.
6.4. On-Line Calculations
Continuous fatigue calculation is performed online. The following steps are processed sequentially: FMS requires a standard PC.
The FMS software is implemented as a Web application. This system concept enables operation and calling up of information both directly in the system as well as from any PC in the office network (providing that a connection to the office network is implemented).
• Acquisition of the requisite values (a typical acquisition interval is 30 s)
• Calculation of inner wall temperature and mean wall temperature from the fluid temperature if these are not directly measurable
All data (measurement, configuration and results data) are stored in a database.
• Determination of exposure times and calculation of current creep fatigue
Background processes activated by time control ensure that the data acquisition and on-line fatigue calculation are performed continuously and independently of the user interface.
• Calculation of component stress, check if new load cycles have taken place, assignment of the load cycles to the defined classes and determination of current low-cycle fatigue and fatigue component of the list of residual extremes. Display of the information in the form of logs, tables or
trend plots is updated and compiled based on the latest database values each time the user interface is called up. (cf. Fig. 3).
• Calculation of total fatigue
Fig. 3 Structure of FMS
First European Conference of the Prognostics and Health Management Society, 2012
29
European Conference of Prognostics and Health Management Society 2012
8. INTEGRATION OF FMS IN THE SPPA-T3000 PROCESS I&C SYSTEM
Although FMS obtains data from the process I&C as an independent module, the results obtained are not written back again.
The configuration of the module is more complex – for FMS this includes the entry of material and component data. Previously, this had to be performed completely separately from configuration of the process I&C system with separate tools. Measuring points and their designations in the process I&C had to be coordinated through parameter lists and entered in the FMS.
Many diagnostics modules which do not yet provide integrated functions in the I&C are in a similar situation.
It is therefore often desirable – as was also the case for the FMS – to be able to perform the configuration with the tools of the process I&C and to display the results from the diagnostic module in the I&C and to be able to use the infrastructure available there (display in process displays, trend plots, automatic report generation).
The solution lies in embedding the modules in a runtime container (see Fig. 4).
A runtime container is a component of the SPPA-T3000 process I&C system with strictly defined interfaces and functions. In addition to simple measurement and calculation results, further, more complex structured data can be exchanged with other SPPA-T3000 components
through these interfaces. In addition, SPPA-T3000 can control and diagnose the module through these interfaces.
The program code of FMS is not changed by embedding. It is the original code of the independent module, which ensures that errors within the module can be ruled out on integration and that existing certifications remain effective.
Embedding enables reuse of the results from the module in the process I&C. For example, they can be
• Displayed together with measured values in process and curve displays
• Stored in the process data archive and used together with stored measured values for later evaluations
• Input in controls or other automatic control functions • Used for generating alarms which are annunciated
together with process alarms in the alarm display. However, some results (the classified fatigue data) must be expected in very specific displays in the boiler fatigue monitoring module which exceed the possibilities of the standard tools in SPPA-T3000.
It proved to be an advantage here that both the SPPA-T3000 process I&C system as well as the FMS module use Web technology for the graphical user interface. It was thus possible to insert special logs and displays from FMS as a separate (browser) window in the user interface of the process I&C. The authorizations and thus also the access restrictions of the T3000 operator be inherited here from SPPA-T3000 to FMS.
Fig. 4 Embedding the FMS module (code and data) in an SPPA-T3000 runtime container
6
First European Conference of the Prognostics and Health Management Society, 2012
30
European Conference of Prognostics and Health Management Society 2012
9. EXAMPLES
The relevant information for continuous fatigue monitoring is provided in the form of logs on the user interface (and also in parallel as a PDF file for downloading):
• Overview (summarizing tabular presentation of fatigue values for all monitored components, cf. Fig. 5)
• Theoretical service life (component-specific) for each defined pressure/temperature class
• Exposure time log (component-specific) – operating time for each defined pressure/temperature class including the resulting creep fatigue
• Numbers of cycles to crack initiation (component-specific) for each defined stress range/temperature class
• Load cycles (component-specific) for each defined stress range/temperature class including the resulting low-cycle fatigue (cf. Fig. 6)
• Configuration data for the components to be monitored • Configuration data for materials (material database) In addition to the output of logs, the results can also be displayed graphically – fatigue values together with operating parameters – this enabling direct comparison of the operating mode of the plant with the resulting utilization values (cf. Fig. 7).
Fig. 5 FMS overview display (all monitored components and the current results of the fatigue calculation)
Fig. 6 FMS display output (detail protocol for HP drum 10HAD10BB001W – matrix of completed cycles in dependence of temperature and stress)
7
First European Conference of the Prognostics and Health Management Society, 2012
31
European Conference of Prognostics and Health Management Society 2012
Fig. 7 Trend of active power (blue) and low cycle fatigue (red) over time from Oct. 10 to Nov. 03 with a noticeable start-up at Oct. 15
Low cycle fatigue (DWE) depicts the influence of load cycles – in particular start-up and shut-down. Fig. 6 shows the detail protocol of low cycle fatigue with cycles in dependence of temperature and stress. Start-up and shut-down operations that contribute particularly to the lifetime consumption can easily detected by a simple representation of the trend. An example for this purpose is Fig. 7 with active power of the power plant and low cycle fatigue in dependence of time.
In the period under review several start-up and shut-down operations happened. While for most operations the low cycle only slightly increased, it rose significantly at the 15th October start-up.
The analysis of the start-up showed that cold fluid was injected into the boiler, which resulted in high stresses in the boiler component wall. It was recommended to prevent such operation in the future.
10. SUMMARY
Assessment of fatigue and remaining useful life for boilers according to the 2001 issued EN 12952 uses simplified methods for the evaluation of creep and low cycle fatigue. The consequence of this simplification is some conservatism for the estimated damage fraction.
The remaining useful life is expressed in parts of 100% taking into account the different approaches of creep fatigue
and low cycle fatigue, especially that cycling operation of power plants results in low cycle fatigue and is not related to hours.
The main advantage of the assessment procedures from the standard is that they are easy to apply. Particularly they are qualified for temperatures up to more than 600 °C and 200,000 h and more service time of boiler components.
The boiler fatigue monitoring module FMS is based on EN 12952. FMS has been in use in power plants since mid-2004. It has been included in the scope of supply for many new combined-cycle power plants from Siemens Energy or backfit in existing plants.
The FMS module is certified by the German technical inspectorate TÜV Süd.
For the power plant operator, the implementation of FMS provides a continuous overview of the service life utilization of his boiler, so that
• The time for a necessary inspection can be selected optimally and thus the operating time between two inspections maximized
• Power plant safety can be increased • Operating modes causing heavy wear can be detected
and if possible prevented
8
First European Conference of the Prognostics and Health Management Society, 2012
32
European Conference of Prognostics and Health Management Society 2012
• Components can be operated close to the material limits, so that the operating time of the plant can be maximized and operating costs minimized
The assessment of fatigue and remaining useful life for boilers according to EN 12952 is accepted as a standardized industrial application.
The Fatigue Monitoring System (FMS) is a standard industrial application for both, new built power plants and retrofits of existing units of any kind. The system is not limited to Siemens I&C systems, it is possible to integrate FMS also into power plants with I&C systems of other suppliers. FMS is also capable for calculating the remaining lifetime for boilers designed according to the American standard ASME VIII-2. Since 2004 it has been successfully implemented for more than 50 boilers.
REFERENCES
European Committee for Standardization (CEN) (2001). EN 12952-3, Water-tube boilers and auxiliary installations - Part 3: Design and calculation for pressure parts, Brussels, Belgium
European Committee for Standardization (CEN) (2001). EN 12952-4, Water-tube boilers and auxiliary installations - Part 4: In-service boiler life expectancy calculations, Brussels, Belgium
The American Society of Mechanical Engineers (ASME) (2001), ASME Boiler and Pressure Vessel Code Section VIII Division 2 (ASME VIII-2), Rules of Construction of Pressure Vessels – Alternative Rules, New York
American Boiler Manufacturers Association (ABMA), Task Group On Cyclic Service (2003), Comparison of fatigue assessment techniques for heat recovery steam generators, Version 1-1
Kunze, U., Walz, H. (2007), Integration of Web based Diagnostic Systems into Power Plant I&C with Boiler Fatigue Monitoring as an Example (in German), Proceedings of International ETG Congress, October 23/24, Karlsruhe, Germany
Kunze, U., Pels Leusden, C., Spinner, R., Hackstein, H., Walz, H., (2008), Integration der Lebensdauerüber-wachung von Dampferzeugern in die Kraftwerksleit-technik, 40. Kraftwerkstechnisches Kolloquium 2008, October 14/15, Dresden, Germany
BIOGRAPHIES
Ulrich Kunze is physicist and received a doctorate (Dr.-Ing. habil.) in mechanical engineering. He works as a senior expert for diagnostics of fossil fired power plants at Siemens AG in Erlangen (Germany). Previous he was head of the diagnostics department in a German nuclear power plant and then project manager at Siemens responsible for installation of diagnostic systems in nuclear power plants world wide. Currently he is member of the DIN and ISO specialist groups for Condition Monitoring and Diagnostics of Machines (TC 108 SC 5).
Stefan Raab received a doctorate (Dr.-Ing.) in mechanical engineering. Currently he is leading the diagnostics group for plant performance at Siemens AG in Erlangen (Germany). He is member of a VGB PowerTech specialists group for power plant performance diagnostics.
9
First European Conference of the Prognostics and Health Management Society, 2012
33
Autonomous Prognostics and Health Management (APHM)
Jacek Stecki1, Joshua Cross
2, Chris Stecki
3 and Andrew Lucas
4
1,3PHM Technology Pty Ltd, 9/16 Queens Pde, VIC 3068, Australia.
2,4Agent Oriented Software Pty. Ltd., 580 Elizabeth Street, Melbourne, VIC 3000, Australia.
ABSTRACT
The objective of this paper is to show how PHM con-
cepts can be included in the design of an autonomous Un-
manned Air Vehicle (UAV) and in doing so, provide effec-
tive diagnostic/prognostic capabilities during system opera-
tion. The authors propose a PHM Cycle that is divided into
two parts, covering the design of the Autonomous PHM
system and the operation of the PHM system in real-time
application. The paper presents steps in design of Autono-
mous Prognostics and Health Management (APHM) devel-
oped using the above approach, to provide contingency
management integrated with autonomous decision-making
for power management on a UAV. APHM was developed
using commercial software tools such as the JACK® auton-
omous software platform to provide real-time intelligent
decision making and MADe®
- Maintenance Aware Design
environment to identify risks due to equipment failures and
to select appropriate sensor coverage. The PHM Cycle
methodology is demonstrated in application to autonomous,
real-time, engine health and power management on an Un-
manned Air Vehicle (UAV).
1. INTRODUCTION
Prognostics and Health Management (PHM) is a new ap-
proach to enhancing system sustainability which redefines
and extends Condition based Maintenance (CBM) on the
basis of current advances in failure analysis, sensor technol-
ogy and AI based Prognostics (Scheuren, W. J., Caldwell,
K. A., Goodman, G. A. & Wegman, A. K. (1998)). The two
basic tenets of Prognostics and Health Management are:
Prognostics - predictive diagnostics which includes de-
termining the remaining life or time span for the proper
operation of a component
Health Management - the capability to make appropriate
decisions about maintenance actions based on diagnos-
tics/prognostics information, available resources and op-
erational demand.
The paper discusses the methodology for integrating PHM
concepts into system design to provide autonomous diag-
nostic and prognostic capabilities during system operation
The Autonomous PHM system proposed in this paper is
designed on the basis of correct risk assessment and the
reasoning capability which is able assess the sensor readings
and determine the state of the system and the appropriate
action. The effectiveness of a PHM system depends on
comprehensive and correct identification of risks due to
system failures and system responses to those failures.
Knowing the failures. the optimum combination of sensors
must be identified and any ambiguities in the detection of
failure modes resolved. Sensor coverage can be augmented
by BITs and component-specific sensors to increase reliabil-
ity of diagnostics and to eliminate ambiguities in the detec-
tion of failure modes. The resulting sensor set provides
sensing patterns which are syndromes of particular failures
of the system, and can be expressed as diagnostic rules. Di-
agnostic coverage maybe further enhanced by application of
probabilistic methods. Having identified the functional fail-
ure modes and determined their criticality, reasoning tech-
niques based on artificial agent technology can be applied to
determine a set of actions that is the most appropriate for the
given situation.
A reasoning system improves the diagnosis by maximiz-
ing the likelihood of determining the failure mode correctly.
It is also able to determine the most appropriate course of
corrective actions – taking into account current circum-
stances such as the flight mode, power requirement and the
state of both engines.
_____________________
Jacek Stecki et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which
permits unrestricted use, distribution, and reproduction in any medium,
provided the original author and source are credited.
First European Conference of the Prognostics and Health Management Society, 2012
34
European Conference of Prognostics and Health Management Society 2012
2
This provides a greater level of awareness than a warning
light. Normally, a human would have to determine the ap-
propriate action on their own, based on the information
available (warning lights, error codes, vibrations, etc.).
However, if the human (or a decision-making system) re-
ceives incorrect or incomplete information they may take an
unnecessarily cautious approach and, for example, shut the
engine down, or they may continue the current operation
failing to take any remedial action. Both of these circum-
stances can lead to catastrophic consequences
Often an overly sensitive failure detection system can cause
“false positive” warnings, i.e., generating an alert for a non-
existent fault. This problem is highlighted in a recent Flight
International magazine article on the introduction of a new-
generation airliner with sophisticated fault detection and
alert system (Anon (2010)). One airline experienced a
plethora of system nuisance warnings, which: “are driving
down technical dispatch (reliability)”. Another operator
reported: “What we are grappling with are algorithms for
failure detection, which not only detect a failure but also act
upon it. Unfortunately this can lead to a perfectly healthy
system being shut down or [a no-go fault warning] for a
problem that was minor enough to have been deferred.”
The Autonomous PHM system discussed in this paper aims
to apply reasoning equivalent to that of a human crew and
thus act like an artificial assistant. Such a system could
greatly reduce the crew or operator workload in high stress
situations, leading to improved levels of safety. This paper
uses results of work on the development of PHM and con-
tingency management integrated with autonomous decision-
making carried out as a core part of the UK National
ASTRAEA Unmanned Air System (UAS) program. This
program is paving the way for commercial UAS to operate
autonomously in non-segregated airspace within the next
decade.
It proposes the integration of PHM into a system at the de-
sign stage, based on a PHM Cycle that combines the Design
and Operational perspectives. Combining capabilities of
current commercial software tools, such as JACK - the au-
tonomous software platform and MADe - the Maintenance
Aware Design environment a PHM system is designed of-
fering greater accuracy in the detection of faults, and
providing selection of the best response actions ((Glover,
W., Cross, J., Lucas, A., Stecki, C., & Stecki, J. (2010)).
2. THE PHM CYCLE
The proposed PHM Cycle is divided into two parts, cover-
ing the design and operation of the system as shown in Fig.
1.
Figure 1. PHM Cycle
The Design Cycle applies multiple iterations of risk analysis
techniques, failure mode prediction, and identification of
responding actions to achieve an appropriate level of func-
tional failure coverage. The outcome of this is a knowledge
base which can then be applied to a system in operation.
The Operational Cycle describes the PHM process when the
system is put into operation. It describes how information
about faults is gathered, assessed and presented to the end
user, or addressed by the autonomous system.
By structuring the PHM design process appropriately, data
from the Operational Cycle can be fed back and incorpo-
rated into the Design Cycle, yielding continuous improve-
ment in future upgrades or revisions.
2.1 The PHM Design Cycle
The objective of PHM Design Cycle is develop an advisory
system which will assess, in real-time, the health of the
system and recommends corrective actions to a higher-level
decision maker that has to deal with a number of potentially
conflicting goals, hostile situations and opportunities apart
from input from the PHM. The decision-maker, either a
human or a fully autonomous decision system, will have the
situational awareness to apply the recommended actions
appropriately.
The Design Cycle begins with the specification of the sys-
tem to be built, which is modeled as a functional block dia-
gram.
Risk Analysis and Determination of Functional Failure
Modes. The first requirement of the risk analysis is to iden-
tify the possible Functional Failure Modes (FFMs) for the
system and to understand their failure dependencies
throughout the system. FFMs are the result of specific un-
derlying physical failures triggered by design, manufactur-
ing, environmental, operational and maintenance causes.
Such causes (e.g. vibration) can initiate failure mechanisms
(e.g. high cycle fatigue) that lead to a fault (e.g. fracture).
First European Conference of the Prognostics and Health Management Society, 2012
35
European Conference of Prognostics and Health Management Society 2012
3
The second requirement is to determine how the failures
propagate through the system (known as the propagation
path) and how this impacts the system functionality. The
availability of such information is a key requirement for
designing, developing, verifying and validating PHM sys-
tem design.
Causes of Failures
Identification of faults and failure modes
Criticality of each failure
Interaction between failures (dependencies)
Expected functional/hardware reliability
Diagnostic coverage
Predictive failure model
MADe
Set of Beliefs (data set)
Set of events (e.g failure) it will respond to
Set of goals to achieve
Set of plans to handle goals and events
JACK
Sensor coverage
Figure 2. Design of APHM
The outputs of the risk analysis process are usually captured
in a Failure Modes and Effects Analysis (FMEA). Once the
FMEA is available, the criticality of each FFM is estab-
lished taking into consideration each specific failure and its
propagation paths, the output of this process is the Failure
Modes Effects and Criticality Analysis (FMECA) report.
Further assessment of the risk is obtained by carrying out
reliability analysis using Reliability Block Diagrams and
Fault Trees. Extensive evaluation of system sustainability is
conducted using a Reliability Centered Maintenance (RCM)
methodology. Reliability analysis is usually performed on
the basis of the expected Mean Time Between Failure
(MTBF) of hardware components as provided by manufac-
turers or on the basis of published MTBF standards. In addi-
tion to this information PHM requires an assessment of the
reliability of specific functional outputs in the system –
‘functional reliability’.
At the conclusion of the risk assessment process, the user
can expect to know:
how the system elements can fail (failure modes)
the criticality of each failure
the likely causes of functional failures
the interactions between functional failures
what physical failures are linked to functional failures
the expected functional and hardware reliability of the
system.
The information obtained during development of FMECA
and Reliability studies is a basis for selecting sensor sets
able to detect identified failures and formulating diagnostic
rules. This process is discussed in the following section.
There are two type of approaches to the failure risk analysis.
The first is a “committee approach” where a team subject
matter experts determine failures and their dependencies and
subsequent list them in the “spreadsheet” type software.
Quality of the analysis depends on knowledge and experi-
ence of team members. Reliability studies are usually car-
ried out by a different team of people using specialized reli-
ability software. Sensor selection and development of diag-
nostic rules cannot directly use the results of FMECA anal-
ysis.
The second, model-based risk assessment approach uses
existing failure databases and expert knowledge captured in
the form of Failure Diagrams and Functional Block dia-
grams of a system. A standardized functional and failure
taxonomy ensures consistency in the interpretation of failure
analysis results (Rudov-Clark, S. J., & Stecki, J. (2009)).
Reliability models are automatically generated from the
functional model of the system. Sensor selection and diag-
nostic rules are also determined based on automated analy-
sis of the functional model.
Risk assessment as briefly described above forms the basis
for any further work on the development of PHM system.
Some common problems causing sub-optimal operation of
PHM systems can be traced to following risk assessment
deficiencies:
dependencies of failures are not
identified
inadequate identification of risks
incomplete database of failures
i
nconsistent language used to define functions and fail-
ure concepts
confusing hardware reliability with functional reliabil-
ity
different models for Criticality and Reliability Assess-
ments.To overcome these deficiencies MADe - the Mainte-
nance Aware Design environment was used as a risk as-
sessment tool facilitating failure modes analysis and relia-
bility assessment.
Sensor and Diagnostic coverage. Detection of a failure
mode is the first and most important step in the PHM pro-
cess. After all, if we cannot identify failure mode we cannot
propose a corrective action. When a failure mode is isolated
the reasoning system will attempt to identify the causes of
the failure mode.
First European Conference of the Prognostics and Health Management Society, 2012
36
European Conference of Prognostics and Health Management Society 2012
4
Sensors are usually selected to detect specific identified
failures (e.g. temperature sensor detect temperature change
indicating a failure of the heater) thus they are selected to
detect symptoms of failures. The sensors are usually select-
ed by personnel responsible for individual components or
subsystems who may have only limited knowledge of the
impact of their failures on system failures. The final compo-
sition of sensor set is decided upon by the system integrator
using criteria such as cost, weight, reliability and computing
requirements. The overall coverage of system failures is
determined using testability analysis software. The
diagnostic rules are developed on the basis of symptoms.
This methodology has following weaknesses:
sensor fusion is not based on failure dependencies
(fallback – testability)
diagnostic rules are not based on failure dependencies
failure coverage is often incomplete and cannot be as-
sessed
sensor selection does not consider the criticality of fail-
ures, or the functional and hardware reliability
sensor fusion is difficult to implement without failure
dependency information.
Model-based approach to sensor selection disposes of some
of these weaknesses. The MADe/PHM module uses the
model of the system and failure dependencies data obtained
in the risk analysis phase and provides the user with an au-
tomated ‘sensor set design’ function (Rudov-Clark, S. D. ,
Ryan, A. J. ,Stecki, C. M. & Stecki, J. S. (2009)). Each
potential sensor set provides a logical cover of the identified
failures. In contrast to the above mentioned ‘symptom of
failure’ methodology, the sensor set fuses sensors reading to
provide a syndrome of failure. The selection of compo-
nent/subsystem sensors solely on the basis of failure symp-
toms can also be carried out and fused with sensor sets
based on identification of the syndrome of failure.
By applying this automated approach, with associated ca-
pability to conduct trade-off studies of sensor properties
such as cost, weight, coverage and reliability, the engineer
can select the best possible arrangement of sensors for the
given constraints, providing the highest practical level of
fault coverage achievable.
Although full coverage of faults is always preferable, it is
not necessarily achievable due to system constraints. Also,
some failure modes may have degrees of criticality that are
below the level of concern and thus they can be excluded
from further analysis.
If full failure coverage is not achieved by the set of diagnos-
tic sensors then ambiguity groups exist, i.e. a number of
different failure modes have the same system functional
responses. These ambiguity groups can be resolved by iden-
tifying the most likely fault based on the probability of fail-
ure and information about the physical processes and symp-
toms for each failure provided in the failures database.
The system designer must be aware of the potential implica-
tions of any unresolved ambiguities. These ambiguities will
directly impact upon the ability of the PHM function to take
the best remedial action – if it is unable to identify the cor-
rect failure mode then it is unlikely to respond correctly. As
such, the designer should, possibly during subsequent de-
sign iterations, attempt to remove these ambiguities wherev-
er possible or have contingencies built into the responses to
handle their occurrence, for example by integrating BITs or
other sensors associated with components.
It is important to remember that the above sensor require-
ments analyzes are based on a functional model that is
qualitative in nature. Thus further quantitative analysis of
the sensor set should be considered to validate the results.
The selected sensor set and results of the failure modes and
effects analysis provide the basis for the design diagnostic
rules needed to identify each failure mode.
Detection and Diagnosis. In on-line, real-time operation
inaccurate sensor readings may introduce response patterns
which do not correspond to any of the diagnostic rules. One
potential solution is the use of multiple redundant sensors
that provide a means for resolving differences (e.g. by “vot-
ing”). Another solution is the application of reasoning tech-
niques that look for the probable cause of any undefined
sensor readings.
Theoretically a sensor set which provides required diagnos-
tic coverage of failure modes will identify all the failure
modes. In practice it is not always so. In practical terms a
Figure 3. Operational cycle
First European Conference of the Prognostics and Health Management Society, 2012
37
European Conference of Prognostics and Health Management Society 2012
5
diagnostic sensor set has a certain Probability of Detection
(POD) which is a function of reliability of detection, pro-
cessing, and interpretation of information provided by indi-
vidual sensors in a diagnostic set. Each failure mode may
have different POD. Thus to diagnose a failure mode the
reasoning system must not only identify the appropriate
sensor responses but also consider Probability of Detection
of this sensor and that of a whole sensor set. For example, a
pressure sensor may have much higher POD than a vibra-
tion sensor, mainly due to the low reliability of vibration
signal interpretation.
As the PHM system should provide predictive capability,
the failure models should be extended to include infor-
mation such as historic data of previous failures, results of
tests, physics of failure, and length of time for a failure to
develop. The length of time it takes for a failure mode to
develop, from the initiation of a failure mechanism to the
development of a fault and propagation of the subsequent
functional failure, is important information for choosing the
best actions to mitigate the failure. If a failure is instantane-
ous, for example, fan blade failure due to catastrophic For-
eign Object Damage (FOD), then immediate action will be
required. If a failure is gradual there could be some time to
perform other actions to slow down the progression of the
fault or mitigate its consequences.
For example, compressor blade damage from a bird strike
that leads to high-cycle fatigue failure can be addressed by
reducing the engine speed thus reducing the rate of crack
propagation.
Different faults and failure modes may occur in rapid suc-
cession, leading to multiple simultaneous responses being
detected.
Developing the Knowledge Base. The knowledge base
developed during the PHM Design Cycle includes:
1. a rule base for performing diagnostics and identifying
each FFM along with its underlying causes
2. a predicted failure model
3. a set of actions corresponding to each failure.
The knowledge base is designed in such a way that a deci-
sion-making system such as an artificial agent can reason
about it. If possible, the actions should provide complete
coverage of all identifiable failures, and give all possible
responses (or actions to be taken) for the identified failure.
With the possible FFM identified, and the sensors chosen
and the rules for identifying these failures deduced, the ac-
tion required for each failure are determined. On-line PHM
systems reason about actions in often rapidly changing envi-
ronments, and operate autonomously. Architectures such as
the Beliefs, Desires, Intentions (BDI) model have been de-
veloped to deal with these kinds of situations and are im-
plemented in the JACK autonomous software platform.
A JACK agent is a software component that can exhibit
reasoning behaviour under both pro-active (goal directed)
and reactive (event driven) stimuli. Each agent has:
a set of beliefs about the world (its data set)
a set of events that it will respond to
a set of goals that it may desire to achieve (either at the
request of an external agent, as a consequence of an
event, or when one or more of its beliefs change), and
a set of plans that describe how it can handle the goals
or events that may arise.
In particular, each agent is able to exhibit the following
properties associated with rational behaviour:
Goal-directed focus – the agent focuses on the objective
and not the method chosen to achieve it
Real-time context sensitivity – the agent will keep track
of which options are applicable at each given moment,
and make decisions about what to try and retry based on
present conditions
Real-time validation of approach – the agent will ensure
that a chosen course of action is pursued only for as
long as certain maintenance conditions continue to be
true
Concurrency – the agent system is multi-threaded. If
new goals and events arise, the agent will be able to
prioritise between them, resolve potential conflicts (e.g.
by deliberate to reject or ignore certain goals or delay-
ing their resolution to a later time), and multi-task as
required.
When an agent is instantiated in a system, it will wait until it
is given a goal to achieve or experiences an event that it
must respond to. When such a goal or event arises, it deter-
mines what course of action it will take. If the agent already
believes that the goal or event has been handled (as may
happen when it is asked to do something that it believes has
already been achieved), it does nothing. Otherwise, it looks
through its plans to find those that are relevant to the request
and applicable to the situation. If it has any problems exe-
cuting this plan, it looks for others that might apply and
keeps cycling through its alternatives until it succeeds or all
alternatives are exhausted. The BDI agent-is able to be pro-
grammed to execute these plans just as a rational person
would.
2.2. The PHM Operational Cycle
Once the Design Cycle has been completed and the Auton-
omous PHM system contains a sufficient level of coverage
the system, along with the knowledge base developed, it can
be put into use on board of the host system.
First European Conference of the Prognostics and Health Management Society, 2012
38
European Conference of Prognostics and Health Management Society 2012
6
The Operational cycle consists of following activities, Fig.
3:
Real-time Monitoring. In operation, the PHM function will
receive signals from each of the sensors located in the sys-
tem or its sub-components. These signals will be constantly
monitored, as in conventional systems, so that signal levels
that are outside the normal range are detected as anomalies.
This differs from conventional approaches as instead of giv-
ing a simple warning the anomalies are passed to an on-
board diagnostic unit that can provide a response appropri-
ate in the current circumstances, and also show how to re-
duce or mitigate the identified fault’s effects.
On-board Diagnostics. The on board diagnostic unit will
make use of the knowledge base developed in the Design
Cycle to associate the anomaly or anomalies with a particu-
lar FFM. The knowledge base can also provide enough in-
formation to identify or predict which physical parts or fail-
ure mechanisms are responsible for the failure. If the sensor
readings are not sufficient, the diagnostic unit should once
again examine reliability data, criticality, and dependencies
to determine the FFM. Context specific confirmation rules,
can also be applied to help resolve ambiguities or probe
further.
Failure Prediction. Once the particular FFM has been iden-
tified, the PHM system must predict the remaining life asso-
ciated with that failure. The failure models (contained in the
knowledge base) for the sub-components or parts identified
to have failed, will be analyzed in order to determine what
time constraints are involved and how the failure will de-
velop.
Action Determination. The PHM system now has all the
information it needs to make an informed decision about
which actions it should take (in the case of an autonomous
system), or recommend. It now has at its disposal:
the sensor readings perceived to be anomalous
the functional fault this corresponds to
the physical defect or failure likely to have caused this
fault
a model of how the system will continue to fail, includ-
ing the estimated time before further failures occur. Us-
ing the above information the PHM system will select
the actions that it perceives to be the best for the given
situation.
Depending on application, PHM capability can be designed
into autonomous or semi-autonomous systems to diagnose
faults, predict remaining functional life and suggest reason-
able actions to deal with these events, if (or when) they oc-
cur. When deployed, depending on application, the action
determined by Autonomous PHM would not necessarily be
the final action to be performed. This is due to the Autono-
mous PHM not necessarily having complete knowledge of
the situational context surrounding the system’s operation.
In such application it would pass the appropriate action al-
ternatives to a higher-level decision-making system or hu-
man user who, in turn, would make this selection and initi-
ate the associated action.
3. EXAMPLE: POWER MANAGEMENT ON A UAV
A typical example is an autonomous, real-time, engine
health and power management on an Unmanned Air Vehicle
(UAV) where it might manage the specific subsystems (i.e.,
the engine, drive trains, etc.) of the overall vehicle.
The PHM and Power Management, Fig. 4, forms part of
a delegated autonomy architecture in an autonomous system
with the human overseer always remaining in the position of
ultimate management responsibility. It will not know how
critical these requirements are with respect to the overall
task being performed by the vehicle it is attached to. It is the
responsibility of the high-level decision maker to evaluate
the mission or task, as it is in the best position to make such
a decision. It can then feed new requirements to the Auton-
omous PHM.
Consider a UAV in flight: the autonomous software must be
able to handle faults when they occur with equivalent or
better levels of competence than a human pilot if the UAV
is to achieve civil certification. The faults identified may
require actions to be taken to avert danger and could cause
the mission to be altered or abandoned.
Design. The example being used is the lubrication system
on the Rolls-Royce 250 engine, and how failures can occur,
e.g. of bearings. The FMECA analysis was completed in
MADe. The autonomous PHM capability is being imple-
mented in AOS’s C-BDI, and the operational scenario is
based upon a twin-engine UAS operating at high power in a
hot and high altitude environment. It is expected that this
demonstrator will be completed in 2012 and the results pub-
lished at that time.
Figure 4. Delegated Autonomy Architecture
First European Conference of the Prognostics and Health Management Society, 2012
39
European Conference of Prognostics and Health Management Society 2012
7
1. The development of PHM system followed the above
Design Cycle methodology:a functional model was cre-
ated of the engines, including the interactions between
the critical internal components (over 12000 functional
connections).a risk analysis was performed determining
the various ways the engine can fail. The sensor types
and locations are chosen and rules identified that associ-
ate the various sensor readings to FFMs. Data would be
included from previous applications of that engine type
or similar engines, such as maintenance logs, failure
rates, and results of examinations performed on previ-
ously failed engines.
4. the reliability data, when available, will be used to aid in
the creation of the failure models
5. the agent actions are under constructions taking into ac-
count all of the possible actions that can be done to the
engine. These may include increasing or decreasing the
thrust or shutting down the engine completely
6. the knowledge base is being created to be inserted into
the PHM function on the UAV.
Operation. Consider the scenario of the UAV performing a
search and rescue mission. During the operation a bearing
within an oil pump on one of the two engines begins to suf-
fer from wear.
The PHM system would monitor the engine sensors, detect
any anomalies, and determine if these are significant (e.g.,
not just a spike due to a power on/off transition). The FFM
would be detected by sensors as a loss of oil pressure within
that engine which, when compared to the diagnostics rules
contained within the knowledge base would indicate a pump
failure. By examining the failure probability of each com-
ponent within the pump, the level of functionality lost, and
the rate at which functionality is decreasing, the power
management system would recognize that the cause is likely
to be bearing failure.
Analysis of the failure model for bearing wear failure will
give the probable lead-on effects of this failure mechanism.
The system would then examine the possible actions to
overcome this failure, which may include:
shutting the engine down immediately;
reducing thrust to 60% before continuing operation for up
to 2 hours;
reducing thrust to 30% for 4 hours; and,
other combinations.
The PHM system capability would then assess these actions,
based upon the following situational information:the current
power requirement is that both engines need to operate at
30% thrust for 2 hours;.due to a fault that occurred earlier,
the second engine has already been shut down; andthe re-
maining engine is currently running at 80% thrust to com-
pensate.
For the given situation the PHM would recommend the fol-
lowing actions:turn on the second engine, and operate both
engines at 30%, possibly damaging the second engine fur-
ther;leave second engine shut down and reduce thrust as
much as possible, however it must be at least 60% to meet
the power requirements;abort or alter the mission since the
power requirements cannot be met; orreduce thrust to 70%
and see if the oil pressure returns to nominal level. If it does,
continue with the engine power at that level, otherwise re-
duce further.
An example of a JACK graphical plan that implements this
is shown in Fig. 5. This shows how after reducing thrust the
oil pressure will be monitored for some time to see if the
problem is mitigated (the wait_for block). If it is not then
the thrust is reduced further. If the problem gets worse, then
the engine is shut down. If the problem is mitigated, the
maintain block will keep monitoring the problem to make
sure it doesn’t get worse in the future.
Upon receiving these possible actions, the higher-level deci-
sion-making software can determine if the mission is im-
portant enough to continue (at the risk of further failure) or
if it can be altered. Instead of being overloaded with multi-
ple options, or receiving insufficient information from mul-
tiple simple warnings, the autonomous system will receive a
set of possible actions that are succinct and meaningful.
From this set it can choose the best action for the given situ-
ation.
4. DISCUSSION AND CONCLUSIONS
This functional failure mode approach, based on using rea-
soning to improve the diagnosis will maximize the likeli-
hood of determining the failure mode correctly, and deter-
Figure 5. JACK Plan to Handle an Engine Fault.
First European Conference of the Prognostics and Health Management Society, 2012
40
European Conference of Prognostics and Health Management Society 2012
8
mine the most appropriate course of action – taking into
account current circumstances (e.g., flight mode, power
requirement and the state of both engines). Autonomous
systems must have this capability to operate successfully.
Manned systems will also benefit by improving the accura-
cy of failure mode identification and recommending the best
action to take. By acting like an artificial assistant, such a
system could greatly reduce the crew or operator workload
in high stress situations, leading to improved levels of safe-
ty.
By structuring the PHM design process appropriately, data
from the Operational Cycle can be fed back and incorpo-
rated into the Design Cycle, yielding continuous improve-
ment in future upgrades or revisions of the UAV.
The novelty of the system presented here derives from the
combination of a risk assessment tool with the high-level
representation and flexibility offered by a decision-support
tool, making the resulting system appropriate for integration
into a complex architecture for autonomous vehicles where
multiple levels of delegation and decisions (possibly includ-
ing the human) interact to determine and adapt the course of
actions during a mission.
ACKNOWLEDGEMENT
The authors would like to thank the Technology Strategy
Board (TSB) for funding the ASTRAEA and ASTRAEA II
programs which make work like this possible.
REFERENCES
Anon (2010). A380 In-service report.
http://www.flightglobal.com/page/A380-In-Service-
Report/Airbus-A380-In-Service-Technical-issues.
Glover, W., Cross, J., Lucas, A., Stecki, C., & Stecki, J.
(2010). The Use of Prognostic Health Management for
Autonomous Unmanned Air Systems, Proceeding of
International Conference on Prognostics and Health
Management, October 10-16, Portland, Oregon, USA
Kurtoglu, T., Johnson, S. B., Barszcz, E., Johnson, J.R.,
Robinson, P. I. (2008). Integrating System Health Man-
agement into the Early Design of Aerospace Systems
Using Functional Fault Analysis. Proceeding of Inter-
national Conference on Prognostics and Health Man-
agement, Oct 6-9, Denver, Colorado, USA.
Rudov-Clark, S. J., & Stecki, J. (2009). The language of
FMEA: on the effective use and reuse of FMEA data.
Sixth DSTO International Conference on Health & Us-
age Monitoring. March 9-12, Melbourne, Australia.
Rudov-Clark, S. D. , Ryan, A. J. ,Stecki, C. M. , Stecki, J.
S. (2009). Automated design and optimisation of sensor
sets for Condition-Based Monitoring. Sixth DSTO In-
ternational Conference on Health & Usage Monitoring.
March 9-12, Melbourne, Australia.
Scheuren, W. J., Caldwell, K. A., Goodman, G. A. and
Wegman, A. K. (1998). Joint Strike Fighter Prognostics
and Health Management. Proceedings of the 34th
AIAA/ASME/SAE/ASEE Joint Propulsion Conference
and Exhibit. July 13-15 1998, Arlington
First European Conference of the Prognostics and Health Management Society, 2012
41
Characterization of prognosis methods: an industrial approachJayant Sen Gupta1, Christian Trinquier2, Ariane Lorton3, and Vincent Feuillard4
1,3 EADS Innovation Works, Toulouse, [email protected]@eads.net
2,4 EADS Innovation Works, Suresnes, [email protected]
ABSTRACT
This article presents prognosis implementation from an in-dustrial perspective. From the description of a use-case(available information, data, expertise, objective, expectedperformance indicators, etc.), an engineer should be able toselect easily, among the large variety of prognosis methods,the ones that are compatible with his objectives and means.Many classifications of prognosis methods have already beenpublished but they focus more on the techniques that are in-volved (physical model, statistical model, data-based model,...) than on the necessary inputs to build/learn the modeland/or run it and the expected outputs.
This paper presents the different strategies of maintenanceand the place of prognostics in these strategies. The life cycleof a prognosis function is described, which helps to definerelevant, yet certainly not complete, characteristics of prog-nosis problems and methods. Depending on the maintenancestrategy, the prognosis function will not be used at the samestep and with different objectives. Two different steps of useare defined when using the prognosis function: evaluation ofthe current state and prediction of the prognosis output.
This paper gives also some elements of classification that willhelp an engineer choose the appropriate class of methods touse to solve a prognosis problem.
The paper also illustrates on one example the fact that, de-pending on the information at hand, the prognosis methodchosen is different.
1. INTRODUCTION
Condition-Based Maintenance (CBM) and Predictive Main-tenance seems to be attractive for the civil aeronautical in-
Jayant Sen Gupta et.al. This is an open-access article distributed under theterms of the Creative Commons Attribution 3.0 United States License, whichpermits unrestricted use, distribution, and reproduction in any medium, pro-vided the original author and source are credited.
dustry which bases its maintenance strategies mainly on Pre-determined Maintenance (see (ISO 13306, 2010) for defini-tions). The possible outcomes of CBM in comparison to theexisting maintenance strategies are:
1. increasing of the availability• avoid Operational Interruptions (OI)thanks to early
detection capabilities;• reduce maintenance times by a better scheduling
with less (or no) unscheduled maintenances;2. reduction of Direct Maintenance Costs (DMC)
• optimization of the use of each component, replac-ing it when it has reached almost all its full poten-tial;
• better control on the maintenance scheduling: air-craft (A/C) at the right place, at the right momentwith associated resources to conduct the mainte-nance actions
Of course, all these potential benefits must have the samelevel of safety or with a better level if possible.
In this context, the implementation of a prognosis functionon a component or system becomes a subject of high interestfor an engineer, as an important brick to build a better main-tenance strategy. The implementation process of the mainte-nance strategy is composed of two main phases: the set-up ofthe maintenance strategy (choice of the maintenance strategyand associated parameters) and the application of this main-tenance strategy on the component or system of an A/C. Thequestion of when the implementation of the prognosis func-tion is done is not as simple as it seems and we will show in afirst part the link between maintenance strategy and prognosisimplementation.
The main question, from the engineer point of view, remainsthe choice of an approach to implement the prognosis func-tion of a component or system.
1
First European Conference of the Prognostics and Health Management Society, 2012
42
European Conference of the Prognostics and Health Management Society, 2012
Literature gives a very large variety of methods, usingvery different techniques from knowledge-based to physicaldegradation models ((Vachtsevanos, Lewis, Roemer, Hess, &Wu, 2006), (Jardine, Lin, & Banjevic, 2006), (Schwabacher& Goebel, 2007) or (Sikorska, Hodkiewicz, & Ma, 2011)).But practice does not show that a single method will be theoptimal solution for all systems/components in an A/C. Thequestion becomes, for the industrial, how to choose the ”best”approach to solve one prognosis problem?
To answer this question, the engineer designing the prognosisfunction has mainly two elements:
1. the available information (knowledge, expertise, data,...);Among this information, the knowledge on failure modeand associated degradation modes is essential, yet not al-ways available;
2. the expected performance of the prognosis (prognosishorizon, precision, maintenance cost reduction...); it isrelated to the use of the prognosis output (dispatch, main-tenance optimization, spare management, ...);
On the method part, the type, quantity and quality of the in-formation required to build/identify/learn the model is notalways clearly defined and most of the time assumed to beavailable both in quantity and quality. It is quite the samewith the observations, the data measured on the componentor system, required for the on-line stage. Depending on theinputs, a certain level of performance (prognosis horizon, pre-cision, access to a confidence in the results,...) can be definedfor each method.
Our aim, which goes far beyond this article, is thus to de-scribe classes of methods proposed in the literature with thepoint of view of the design engineer in order to help him un-derstand which methods are usable with the available infor-mation and performance objectives and when to use them inthe prognosis life cycle. As most classification attempts weremade with another goal, we expect to get a slightly differentresult. Sikorska et al. (2011) and Vachtsevanos et al. (2006)are the sources that are the closest to what is expected but themain driver of their classification remains the mathematicaltechniques used by the method.
This paper is divided into four parts. First, the different main-tenance strategies are briefly presented in relation with themodelling assumptions that are hidden behind them. Theplace of the prognosis function in the life cycle of the main-tenance is also discussed. Then, a first draft of classificationof prognosis methods is proposed which aim is to ease thechoice of the design engineer depending on the available in-formation. Then, a simple functional description of the prog-nosis implementation is done. Each method is to be describedin that context, stating how it is built, used and updated within-service data. Finally, on a the same component, a valve,three configurations are described:
• one with only reliability type information;• one with access to a physical model and measures of dif-
ferent stresses;• one with access to measures of a performance indicator
of the valve.
The aim of these three examples is to roughly show how theavailable information and performance objectives drive thechoice of possible methods. Needless to say that this paper isonly a first step towards a more general approach.
2. PROGNOSIS USED IN DIFFERENT MAINTENANCEPHASES
This section explains when prognosis is used for differentmaintenance strategies. Moreover, it highlights the modelingassumptions of the system for these strategies.
2.1. Prognosis usage depends on maintenance types
The different maintenance types are defined in the ISO norm13306 on maintenance terminology (ISO 13306, 2010).
However, in aeronautical context, the Maintenance ReviewBoard (MRB) process, supported by the Maintenance Steer-ing Group-3 (MSG-3) methodology, provides the referencemaintenance overview.
Two maintenance types are mainly used in aeronautical in-dustry. The first one is corrective maintenance: mainte-nance is done or scheduled once an item failure has been de-tected. The second one is predetermined (or planned) main-tenance. Maintenance tasks are planned during design (even-tually adapted during in operations). The maintenance tasksand intervals are defined using MSG-3 methodology.
Predetermined maintenance is non specific, i.e. it is adaptedto a population of items, making decisions based on statisticalconcerns and do not take into account the specific use of eachitem.
When relevant, a more specific maintenance type, calledCondition-Based Maintenance (CBM), is introduced in thenorm ISO 13306 (2010). Although it is not considered in it,we propose to consider two kinds of CBM :
• one based on the current-state of the item, called current-state CBM,
• one based on some specific forecast on the item, calledpredictive maintenance.
This addition to the norm is described in figure 1.
The maintenance decision in current-state CBM is based onthe estimation of the current state of the item (degradation in-dicator for instance), the current state being assessed to be ina maintenance region (scalar threshold or more complex forstate vector). This threshold is defined during design, tak-ing into account characteristics of the maintenance (time to
2
First European Conference of the Prognostics and Health Management Society, 2012
43
European Conference of the Prognostics and Health Management Society, 2012
Figure 1. Different strategies of maintenance in industry
detect, plan and operate maintenance), future conditions andthe prognosis function. On the other hand, predictive main-tenance decision requires the computation of a future char-acteristic of the item at a certain time horizon, using futureconditions that could be specific for the item under study.
All preventive maintenance strategies require the predictionof a remaining time before failure and thus require a progno-sis function.
2.2. Prognostics for maintenance
The main concept used in prognosis is the Remaining Use-ful Life (RUL), which is the remaining time before a fail-ure occurs, also denoted estimated time to failure (see(Vachtsevanos et al., 2006) or (ISO 13381, 2004)). Prognosisis often defined as the estimation of the RUL (see (Sikorska etal., 2011) for an overview of prognosis definitions in the liter-ature), or more generally of a quantity of interest based on theRUL. Because of the multiple uncertainty sources (unknowndegradation process, future conditions, etc), RUL is funda-mentally a random variable. As this concept is not easily us-able to make decisions, the output of the prognosis should bea quantity based on this random variable:
• the estimation of the mean of the RUL with confidencebounds;
• the estimation of operational reliability at a given timehorizon;
• a quantile of RUL for a given risk (the RUL value forwhich the probability to over this value is equal to therisk);
• the probability density function of the RUL...
A maintenance decision in current-state CBM is made bycomparing an output with some thresholds, defined tak-ing into account maintenance constraints, knowledge on thedegradation, risk analysis and/or cost criteria.
In predictive maintenance, the output is the prognosis output(quantity of interest based on the RUL), computed using fu-ture assumption on the item. A prognosis function is thusrequired for the on-line phase.
In current-state CBM, the degradation indicator is the esti-mation of the current state of the item. The maintenancethresholds on the state of the item are determined during theoff-line stage by aggregating all the possible futures and con-sequences of such a state. A prognosis result is needed inthe design of the maintenance strategy to set the maintenancethresholds.
This argument is also true for predetermined maintenance.The maintenance tasks are scheduled according to risk andcost criteria which require a prognosis function during thedesign of the maintenance strategy. The prognosis function isnot required for the on-line stage.
Eventually, a prognosis is required for every preventive main-tenance. However, the prognosis is not used at the samephase. It is done on-line only for predictive maintenance, as ituses specific future assumption that cannot be pre-processed.This difference can be explained also by the different levelsof modeling behind each maintenance type.
2.3. Associated modeling assumptions
This section focuses on preventive maintenance, the associ-ated information used to build the different preventive strate-gies and the modeling assumptions that are done.
Concerning the modeling assumptions, they concern the eval-uation of the present state (pres. in table 1) and the predictionof the future (fut. in table 1). For each of these steps, theitem can be considered as unique (spec. in table 1) or part ofa population of similar items (glob. in table 1).
One can distinguish:
• predetermined maintenance: The associated models arebuilt using only information, knowledge and/or data ofsimilar items, called historical information. It can beprevious run-to-failure, experts or engineers knowledge,historical data, etc. For this maintenance strategy, no spe-cific evaluation of the current state is done and no specificprediction is made on the item. The item is consideredas one item among a population of similar items.
• current state CBM: Compared to the previous one, thismaintenance also requires a modeling of the specificpresent condition of the item. This is done using specificdata, which are online monitoring, inspections, built-intests directly made on the item. For this strategy, thepresent state of the item is estimated individually. Thesame component in another A/C would not have enduredthe same conditions and its present state would be dif-ferent. However, the future of the item is not studied
3
First European Conference of the Prognostics and Health Management Society, 2012
44
European Conference of the Prognostics and Health Management Society, 2012
Maintenance type Data used Modeling
Predetermined
MaintenanceHistorical information
spec. glob.
Pres. xFut. x
Current State
Condition-Based
Maintenance
Historical information
Specific data
spec. glob.
Pres. xFut. x
Predictive
Maintenance
Historical information
Specific data
Future conditions
spec. glob.
Pres. xFut. x
Table 1. Different levels of modeling associated to mainte-nance types
specifically and a treatment has been done during designto select thresholds that account for all the possible fu-tures, missions, that the item might endure.
• predictive maintenance: This last maintenance implies amodeling of the specific future of the item, using specificfuture conditions. For this strategy, both present stateand future of the item are specific. The same item wouldhave different RUL if different future conditions wouldbe met.
This comparison is summarized in table 1.
3. FIRST ELEMENTS OF A CLASSIFICATION OF PROG-NOSIS METHODS FOR A DESIGN ENGINEER
The choice of a prognosis method is not an easy task. Eachmethod has its advantages and drawbacks and its performancedepends strongly on the quality of the inputs used. The avail-able information being different for each case, the best meth-ods will potentially be different for two different cases. Howcan a design engineer find its way through the large diversityof methods proposed in the literature?
The approach presented in this section is still in developmentand will continue to be refined in the future. The starting pointis the available information. Different situations are describeddepending on the level of insight on the degradation process.A class of methods that can be used are associated to eachsituation.
Figure 2 describes the different situations.
The different cases are detailed in the following. No methodsare detailed here but families of methods are given for eachcase.
Figure 2. First elements of classification
Case 1: no specific data In this situation, the design engi-neer has no access to specific data and works only with his-torical data, when available, and reliability studies. This con-straint makes it impossible to implement CBM. In this case,the methods that can be used are reliability based methods,with constant or variable failure rates.
Case 2: for a system, access to the fault state of the compo-nents The failure information of a component is useful onlyfor a system. When available, it allows to update the failurerate of the system (through the reliability diagram) and thusupdate the RUL of the system. In this case, the methods thatcan be used are conditional reliability based methods, withconstant or variable failure rates.
4
First European Conference of the Prognostics and Health Management Society, 2012
45
European Conference of the Prognostics and Health Management Society, 2012
Case 3: no degradation indicator In this situation, spe-cific data is collected on the item but no degradation indicatorhas been identified. Prognosis requires to learn a model thatlinks the observables to the time of failure, for instance usinga database of history of observables and the associated timeof failure. In this case, the methods that can be used are data-based techniques to identify the features and learn the linkbetween the features and the time of failure.
Case 4: direct access to the indicator The building ofa degradation indicator requires a lot of knowledge of thedegradation process or, at least, of its consequences in termsof performance. The simplest situation is when the degrada-tion indicator is directly observable. In this case, the methodsthat can be used are methods to build the evolution of the in-dicator with future mission assumptions.
Case 5: indirect access to the indicator In this case, thedegradation indicator is not directly observable but has to becomputed from other specific data. Two models are to bebuilt and validated. The first model links the specific datawith the degradation indicator. This model can be built using,for instance:
• stress models based on the physics of degradation (envi-ronmental and operational conditions are monitored anda physics-based model computes the damage increment),
• a deviation from a nominal behaviour (both inputs andoutputs of the item are monitored and the deviation be-tween the monitored output and the nominal output com-puted from the monitored inputs is computed)...
For the prediction of the future of the indicator, two choicesare possible. The first is to use the values of the degradationindicator previously computed as can be done in case 4 withthe monitored degradation indicator. The second is to build amodel of the monitored parameters (with ARMA models forinstance) to simulate them in the future and use the first modelto compute the future values of the degradation indicator.
Each case needs to be described with much more details. Thenext section gives a way to describe the implementation of theprognosis function, that could be used to refine the descrip-tion of each of the previous cases.
4. PROCESS OF A PROGNOSIS FUNCTION IMPLEMENTA-TION
This section focuses on the description of the life cycle of aprognosis function implementation. As already mentioned,this implementation will be used during different phases (de-sign or in service). We will highlight in particular the type ofinformation used at each step. This description is dedicatedto a basic prognosis function where there is no fusion donebetween different prognosis functions implementations. This
is the case for components or for systems where the prog-nosis function is not modeled as a logical aggregation of theprognosis functions at component level.
We assume that the analysis of the component or system hasalready been done. Thus, we are in the situation where:
• the item is selected based on economical and risk criteria;• its failure modes are selected using safety analysis and
MSG-3 analysis (occurrence, criticality and cost crite-ria);
• associated degradation processes of the item are identi-fied;
• parameters to monitor in order to define the health statusof the item have been defined (called observables in thefollowing).
4.1. Phase 1: Design of the prognosis function implemen-tation
This phase corresponds to the design of the prognosis imple-mentation. In this step, the aim is to build models that rep-resent both the current state of the component or system andits evolution. It means choosing, developing and tuning themodels from the available information.
The only information that can be used at that stage is his-torical knowledge. This consists in domain expertise, histori-cal data (A/C, fleet), run-to-failures on test benches, feedbackfrom previous programs, etc.
The evaluation of the current state of the component or sys-tem can be direct or indirect. It is called direct if the currentstate is computed from the observables only by a data treat-ment, like filtering for instance. It is called indirect if it iscomputed through a model with observables as inputs. Thecharacterization of the current state could be as different asa scalar health indicator, a performance of a function or thecomplete history of the observables from the last maintenanceaction.
The evolution of the component/system can be either doneby:
• a state model: the evolution of the state of the item is thusresulting from an evolution of the observables, character-izing the future conditions undergone by the item;
• an incremental model: at each time step or cycle, an in-crement is computed and added to the current state.
During this phase, the different models are trained, selectedor identified. A way to validate them during the operations ofthe A/C has to be defined.
Another element that has to be defined during this step is amodel for the different mission conditions.
Finally, the Verification and Validation (V&V) process hasto be done. A first validation with historical data has to be
5
First European Conference of the Prognostics and Health Management Society, 2012
46
European Conference of the Prognostics and Health Management Society, 2012
performed. The performance of the prognosis (see (Saxena,Celaya, Saha, Saha, & K., 2010) for examples of performanceindicators) has to be compatible with the usage of prognosisoutputs.
4.2. Phase 2: on-line execution
The on-line execution is the execution of the previous modelsduring the A/C usage.
4.2.1. Step A: Evaluation of the current state
The current state of the item can be defined as the minimal in-formation that characterizes the state of the item. It can takemany different forms, from the simple scalar health indica-tor, through a state vector that characterizes the state of thecomponent (includes internal variables of a physical modelfor instance), to the complete history of observables from thelast maintenance action (if no other knowledge is available).
The evaluation of the current state of the item is direct if mon-itored and indirect if a model is used to compute it from themonitored parameters.
4.2.2. Step B: Prediction of the prognosis result
This step consists in the computation of the quantity of inter-est based on the RUL (quantile of the RUL, reliability overa time interval, etc.) As already stated in the first step, themodeling of the future missions has to be introduced. Differ-ent cases are possible. The following gives some examples:
• the state of the item is computed by a model, building amodeling of the future inputs is a way to define futureconditions;
• the conditions in the future are the same as they werein the past, if the evolution of the current state of theitem is regular, the previous evaluations of current statein the past can be used to build a trend that can be post-processed to compute a prognosis result;
4.3. Phase 3: Update and V&V
4.3.1. Update of historical data
The first element of this step is the update of historical datadone by collecting the run-to-maintenance of each item andadding them to the historical data.
This update of historical data might lead to an update of thedifferent models that are used in the prognosis implementa-tion.
4.3.2. Validation all along the life cycle of the A/C
The different models used in the prognosis implementationhave been validated using test-bench results, historical data,scenarios of use that are a model of the reality the item willhave to face after EIS.
Right after EIS, the priority is to validate the implementationwith in-service data to measure the effect of the modelingerror of real conditions done in the first validation done in4.1.
All along the life cycle of the A/C, that could last forty to fiftyyears, the validation has to continue, maybe with a differenttime scale, to detect potential drift due to an evolution of theuse of the A/C.
This simple description of the process of implementationgives an idea on how the methods can be used, how they cancollaborate. Moreover, the same methods can be used at dif-ferent steps with different objectives.
5. EXAMPLE OF DIFFERENT PROGNOSIS FUNCTIONS ONTHE BLEED SYSTEM
This section aims at describing an industrial prognostics case,and at illustrating the process described on section 4. Threeprognostics cases will be considered. In each case, the com-ponent under study and the expected prognosis output remainidentical, but the available inputs are different and the prog-nosis performance will be different. Thus, different prognosismethods must be implemented, and the prognosis expectedperformance may not be reachable. As this paper focuseson the prognosis process definition and its characteristics, theprognosis results are not provided here. Moreover, the vali-dation phase is not described in the following.
5.1. Description of the initial example
The component under study here is a pneumatic valve withinthe Bleed air system. This system is part of ATA-36. It pro-vides air to the cabin at an admissible pressure. Basically, ittakes air at high pressure from the engines or auxiliary powerunit (APU), then regulates its pressure and provides this reg-ulated air to the rest of the bleed system. Figure 3 illustratesa Bleed system on a CFM56-5B.
The component under study participates in the pressure reg-ulation. During this process, the air pressure needs to bereduced, which is done by the Pressure Regulating Valve(PRV). There are different kinds of PRV, and we consider herea pneumatic valve (see figure 4). This particular example waspreviously studied in (Daigle & Goebel, 2010).
Due to different kinds of constraints, a performance objectiveis set. For instance, the prognosis horizon must be at least fivehundreds flight-hours.
5.2. At system level: reliability type information
In this example, the bleed system is represented in a very sim-plified way as a set of valves, one per engine, and a com-ponent representing the pipes. In this case, the informationavailable is the constant failure rates of each component of
6
First European Conference of the Prognostics and Health Management Society, 2012
47
European Conference of the Prognostics and Health Management Society, 2012
Figure 3. Scheme of bleed system on CFM-56B
Figure 4. Scheme of the pneumatic valve, from (Daigle &Goebel, 2010)
the system.
The only online information is the fault status of each com-ponent. This case corresponds to Case 2 in figure 2.
The best use of the available information to compute a quan-tity of interest based on the RUL is to use the same model asin classic reliability where the bleed system can be consideredin its logical view, as shown in Figure 5.
The improvement that is done is to take into account the cur-rent state of each component, here the fault status. The dif-
Figure 5. (Very) Simplified view of a bleed system
ference between failure rates when one PRV valve is in faultis due to the change of operational condition, the remainingvalves being overstressed to maintain the bleed performance.
Using a Pure Jump Markovian Process, the RUL conditionalto the current state of the system can be computed as wellas all quantities based on the RUL. Despite the fact that thevariance of the RUL will be smaller than the RUL that wouldbe computed without any information, the added informationis rather poor and the added value may not be sufficient tomeet the objectives of the prognosis function.
Concerning the update phase, the constant failure rates ofeach component could be updated using the real constant fail-ure rates rebuilt from the in-service data.
5.3. At component level: using physics based model
For this case, physical knowledge of the degradation behav-ior of the valve is available, along with experiments to iden-tify and validate the parameters of the physical model. Thescheme of the valve on which the model is built is shown inFigure 4 and is taken from (Daigle & Goebel, 2010).
The only monitored parameter is the pneumatic pressurecommand.
The current state evaluation is done by incrementing the phys-ical degradation caused by the variation of the pneumaticpressure command. This corresponds to Case 5 in figure 2.
The computation of the quantity of interest based on the RULis done by computing the future state of the valve and post-process it to compute the RUL. This can be done at least intwo different ways:
• Model the future conditions that will undergo the valveby modeling the future pneumatic pressure command.Use this command as input of the physical model, ini-tialized by the current state, compute future states of thevalve,
• Assume that future conditions will be the same as previ-ous conditions and make a statistical model of the degra-
7
First European Conference of the Prognostics and Health Management Society, 2012
48
European Conference of the Prognostics and Health Management Society, 2012
dation indicator using the past values of the degradationindicator, for instance using a linear regression modelover use time, or cycles. Use this model to compute fu-ture states of the model.
The future degradation state is then post-processed to com-pute the quantity of interest based on the RUL.
For the second way of the prediction step, the update phasecould be done by capitalizing the models built by the linearregression and study whether they are always the same or arevery different from a component to another or from a missionto another. The history of degradation indicator for one com-ponent could also be added to the historical knowledge as arun-to-maintenance test.
5.4. At component level: using a performance indicator
For this case, the available information is that the degradationof the valve can be characterized by the time of opening andclosing of the valve. Historical knowledge shows also thatthis degradation is relatively smooth and progressive. Thevalve is considered useful as long as the opening and closingtime are smaller than a threshold value.
The available online information consists in the measure ofthe position of the valve from which one can derive the open-ing and closing time.
The current state of the valve is characterized by the historyof closing and opening time monitored since the last replace-ment of the valve.
For the prediction step, the opening and closing time data isused to build a data model, a regression model for instance,which is used to predict future performance of the valve. Theprediction of performance is then post-processed to computethe quantity of interest based on the RUL.
As in the previous case, the update phase consists in the cap-italization of runs-to-maintenance once the component is re-placed and a capitalization of the different models built withthe .
6. CONCLUSION
In this paper, the implementation of prognostics has been pre-sented from a design engineer point of view. The questionsto be addressed are:
• What information is available?
• What method or set of methods can be used to computethe prognosis output?
• If the prognosis built does not reach the expected perfor-mance, what information should be added to reach theexpected performance with the same method or with adifferent one?
In the literature, the classifications of prognosis methods aremostly driven by the mathematical techniques used. In thispaper, a simple classification is presented. This classificationis based on the available knowledge (historical knowledge,expertise, run-to-failures, already existing online monitoring,future mission profiles, etc.) and defines different situations.This classification has been illustrated by describing differentways to build a prognosis on a bleed valve and relating eachexample to one of the situation previously described.
Methods have been associated to each of these situations butthis work will continue in the future. The process of progno-sis implementation is the way proposed to describe more indetails the use the methods in the different cases. It shouldhighlight:
• the type of information and data needed to build the dif-ferent models used by each method, both for the eval-uation of the current state and for the prediction of theRUL;
• the verification and validation process both during designand after the EIS.
A lot of work is still to be done.
NOMENCLATURE
RUL Remaining Useful LifeCBM Condition Based MaintenancePRV Pressure Regulating ValveDMC Direct Maintenance CostV&V Verification and ValidationEIS Entry Into ServiceAPU Auxiliary Power Unit
REFERENCES
Daigle, M., & Goebel, K. (2010). Model-based prognosticsunder limited sensing. In Aerospace Conference, 2010IEEE (pp. 1–12).
ISO 13306. (2010). Maintenance Terminology (Tech. Rep.No. EN 13306:2010). International Organization forStandardization.
ISO 13381. (2004). Condition Monitoring and Diagnos-tics of Machines, Prognostics part 1: General Guide-lines (Vol. ISO/IEC Directives Part 2; Tech. Rep. No.ISO13381-1). International Organization for Standard-ization.
Jardine, A., Lin, D., & Banjevic, D. (2006). A review onmachinery diagnostics and prognostics implementingcondition-based maintenance. Mechanical Systems ansSignal Processing, 1483-1510.
Saxena, A., Celaya, J., Saha, B., Saha, S., & K., G. (2010).Metrics for Offline Evaluation of Prognostics Per-formance. International Journal of Prognostics andHealth Management (IJPHM), 1(1).
Schwabacher, M., & Goebel, K. (2007). A Survey of Ar-
8
First European Conference of the Prognostics and Health Management Society, 2012
49
European Conference of the Prognostics and Health Management Society, 2012
tificial Intelligence for Prognostics. In Proceedings ofAAAI Fall Symposium.
Sikorska, J., Hodkiewicz, M., & Ma, L. (2011, July). Prog-nostic modelling options for remaining useful life es-timation by industry. Mechanical Systems and Signal
Processing, 25(5), 1803-1836.Vachtsevanos, G., Lewis, F. L., Roemer, M., Hess, A., & Wu,
B. (2006). Intelligent Fault Diagnosis and Prognosisfor Engineering Systems. In 1st ed. Hoboken.
9
First European Conference of the Prognostics and Health Management Society, 2012
50
Damage identification and external effects removalfor roller bearing diagnostics
Pirra M.1, Fasana A.2, Garibaldi L.3, and Marchesiello S.4
1,2,3,4 Dynamics & Identification Research Group, DIMEAS, Politecnico di Torino,Corso Duca degli Abruzzi 24, 10129 Torino, Italy
[email protected]@polito.it
ABSTRACT
In this paper we introduce a method to identify if a bearing isdamaged by removing the effects of speed and load. In fact,such conditions influence vibration data during acquisitionsin rotating machinery and may lead to biased results when di-agnostic techniques are applied. This method combines Em-pirical Mode Decomposition (EMD) and Support Vector Ma-chine classification method. The vibration signal acquired isdecomposed into a finite number of Intrinsic Mode Functions(IMFs) and their energy is evaluated. These features are thenused to train a particular type of SVM, namely One-ClassSupport Vector Machine (OCSVM), where only one class ofdata is known. Data acquisition is done both for a healthybearing and for one whose rolling element presents a 450 µmdamage. We consider three speeds and three different radialloads for both bearings, so nine conditions are acquired foreach type of bearing overall. Feature evaluation is done usingEMD and then healthy data belonging to the various condi-tions are taken into account to train the OCSVM. The remain-ing data are analysed by the classifier as test object. The realclass each element belongs to is known, so the efficiency ofthe method can be measured by counting the errors made bythe labelling procedure. These evaluations are performed byapplying different kinds of SVM kernel.
1. INTRODUCTION
Rolling bearings are among the most widely used compo-nents in machinery. Their condition monitoring and fault di-agnosis are then very important in order to prevent the oc-currence of breakdowns. A wide range of different methodshas been proposed since the Seventies to get proper fault di-agnosis techniques. Signal analysis is an important topic inmechanical fault diagnosis research and applications thanksto its ability to extract the fault features and identify the fault
Pirra M. et.al. This is an open-access article distributed under the terms ofthe Creative Commons Attribution 3.0 United States License, which permitsunrestricted use, distribution, and reproduction in any medium, provided theoriginal author and source are credited.
patterns. Methods such as Fourier analysis and time-domainanalysis take into account the acquired signal and are basedon the assumption that the process generating the signal it-self is stationary and linear. Unluckily, the faults are timelocalised transient events, so this kind of techniques couldprovide a wrong information.Some possible ways to overcame these aspects are presentedin Randall and Antoni (2011). They develop an interestingreview of diagnostic analysis of acceleration signals fromrolling element bearings, especially when a strong mask-ing noise is present due to other machine components suchas gears. They show industrial applications that confirmthe reliability of their methods. Another interesting methodthat could be efficiently used in the vibration-based condi-tion monitoring of rotating machines is presented in Antoni(2006). He shows how the Spectral Kurtosis (SK), in contrastto classical kurtosis analysis, provides a robust way of detect-ing incipient faults even in the presence of strong maskingnoise. The other appealing aspect is that it allows to designoptimal filters efficiently to filter out the mechanical signatureof faults.A useful tool to analyse non-stationary signals such as thoserelated to bearing vibrations is wavelet transform. Its strengthcomes from the simultaneous interpretation of the signal inboth time and frequency domain that allows local, transientor intermittent components to be exposed. As drawback thereis the dependence on the choice of the wavelet basis func-tion. An example of wavelet-based analysis technique for thediagnosis of faults in rotating machinery from its vibratingsignature is Chebil, Noel, Mesbah, and Deriche (2009).An innovative technique in the time–frequency domain is theEmpirical Mode Decomposition (EMD) (Huang et al., 1998).It allows any complicated signal to be decomposed into a col-lection of Intrinsic Mode Functions (IMFs) based on the lo-cal characteristic time scale of the signal. It is self-adaptivebecause the IMFs, working as the basis functions, are deter-mined by the signal itself rather than being pre-determined.Hence, EMD is highly efficient in non-stationary data analy-
1
First European Conference of the Prognostics and Health Management Society, 2012
51
European Conference of the Prognostics and Health Management Society, 2012
sis. It has been applied to a wide variety of problems, goingfrom geophysics to structural health monitoring (Huang &Shen, 2005). Lots of authors apply EMD to rotating machinesand bearings with diagnostic intents, usually in associationwith other techniques. Some examples are Gao, Duan, Fan,and Meng (2008), where combined mode functions are intro-duced, Junsheng, Deije, and Yu (2006), that use EMD jointlywith an AutoRegressive model and Yu, Deije, and Junsheng(2006), that train an Artificial Neural Network (ANN) classi-fier with the EMD energy entropies.Another worth of interest aspect is the search for methodsable to remove effects produced in vibrations by external fac-tors, such as environmental temperature or test rig assem-blies. Some examples are presented in Pirra, Gandino, Torri,Garibaldi, and Machorro-Lopez (2011) and in Machorro-Lopez, Bellino, Garibaldi, and Adams (2011), where themulti-variate statistical technique named Principal Compo-nent Analysis (PCA) is used successfully in bearing fault de-tection and rotating shaft. Other factors influencing vibrationsrelated to rotating elements are varying load and speed. Infact a variation in these factors produces some difficulties inrecognising the presence of fault in a signal. Bartelmus andZimroz (2009) show how in condition monitoring of plane-tary gearboxes is important to identify the external varyingload condition. In particular, they analyse in detail how manyfactors influence the vibration signals generated by a systemin which a planetary gearbox is included and show how theload has a consistent contribution. As far as bearings are con-cerned, instead, some works are presented in Cocconcelli,Rubini, Zimroz, and Bartelmus (2011) and Cocconcelli andRubini (2011). They inspect the continuous change of rota-tional speed of the motor, that represent a substantial draw-back in terms of diagnostics of the ball bearing. In fact,the large part of algorithms proposed in the literature needsa constant rotation frequency of the motor to identify faultfrequencies in the spectrum. They tackle the problem withencouraging results aided by ANN and Support Vector Ma-chine (SVM).These two last techniques could be grouped under the termsof soft or natural computing. They are well developed inWorden, Staszewski, and Hensman (2011), an exhaustive tu-torial overview of their basic theory and their applications inthe context of mechanical systems research. SVM in particu-lar, is widely used for condition monitoring and damage clas-sification (Widodo & Yang, 2006), (Rojas & Nandi, 2006). Itis based on the concept of separating data objects into differ-ent classes through an hyperplane. However, this method as-sumes that all types of instances are known before applying it.A particular case of SVM is the One-Class SVM (OCSVM),that is well suited for a diagnostic technique purpose. In fact,it allows the creation of the separating hyperplane startingfrom the knowledge of only one class, that is what usuallyhappens in damage detection. Shin, Eom, and Kim (2005)adopt this method for machine fault detection and classifi-
Figure 1. Acceleration signal and its decomposition for ahealthy bearing (left) and for a faulty rolling element one(right).
cation in electro-mechanical machinery from vibration mea-surements.The intent of our work is to find a parameter able to removethe influence of various external conditions in order to detectproperly a damage in a roller bearing. This paper is organisedas follows. In next two sections EMD method and OCSVMare presented with some theoretical background. Our algo-rithm is explained in Section 4 and then its application on atest rig is developed in the following session.
2. EMPIRICAL MODE DECOMPOSITION
Empirical Mode Decomposition is a method presented byHuang et al. (1998) and based on the local characteristic timescales of a signal. This approach could be seen as a self-adaptive signal processing method that can be applied to non-linear and non-stationary process. In particular, it allows acomplex signal function to be decomposed into a number ofintrinsic mode functions (IMFs). Each one of these compo-nents contains frequencies changing with the signal itself andit has to satisfy the following definition:
• In the entire data set, the number of extrema and the num-ber of zero crossings must either be equal or differ atmost by one.
• At any point, the mean value of the envelope defined bythe local maxima and the envelope defined by the localminima is zero.
Thanks to this definition, each IMF represents the simple os-cillation mode involved in the signal. According to Huang etal. (1998) a sifting process is used in order to extract the IMFsfrom a given signal x(t). It consists of different steps:
2
First European Conference of the Prognostics and Health Management Society, 2012
52
European Conference of the Prognostics and Health Management Society, 2012
1. Identify all the extrema of the signal, and connect all thelocal maxima by a cubic spline line as the upper enve-lope. Repeat the same procedure on the local minima toproduce the lower envelope.
2. Designate the mean of the two envelopes as m1, and thedifference between the signals x(t) and m1 as the firstcomponent, h1, i.e.
x(t)−m1 = h1. (1)
Ideally, if h1 is an IMF, then take it as the first IMF com-ponent of x(t). Otherwise, consider h1 as the originalsignal and repeat the first two step obtaining
h1 −m11 = h11. (2)
Repeat the sifting process up to k times when h1k be-comes an IMF, that is
h1(k−1) −m1k = h1k. (3)
The first IMF component is then designated as
c1 = h1k. (4)
3. Separate c1 from the original signal x(t) to obtain theresidue r1:
r1 = x(t)− c1. (5)
4. Consider r1 as the original signal and repeat the aboveprocess n times, obtaining the other IMFs c2, c3, . . . , cnsatisfying
r1 − c2 = r2...
rn−1 − cn = rn
(6)
5. Stop the decomposition process when rn becomes amonotonic function from which no more IMFs can beextracted. The sum of Eq. (5) and Eq. (6) gives
x(t) =
n∑
i=1
ci + rn. (7)
From Eq. (7) we can see how the signal x(t) can be decom-posed into n empirical modes and a residue rn, that couldbe interpreted as the mean trend of the signal. Each IMF ciincludes different frequency bands ranging from high to lowand is stationary.Figure 1 shows two signals, a healthy and a damaged one.The last one refers to a 450 µm fault on a rolling element. Inboth cases, the original signal and 3-IMFs decomposition ofthe signal itself are presented.
3. ONE-CLASS SUPPORT VECTOR MACHINE
Support vector machine (SVM) is a computational learningmethod developed during the 80s, based on the statisticallearning theory (Vapnik, 1982). It is well suited for classi-fication, because given some data points which belong to acertain class it is able to state the class a new data point would
Figure 2. One-Class SVM classifier where the origin is theonly member of one class.
be in. If we consider an n-dimensional input data made up ofa number of samples belonging to a class, namely positiveor negative, SVM constructs a hyperplane that separates thetwo classes. Moreover, this boundary would satisfy the con-dition that the distance from the nearest data points in eachclass is maximal. In this way, an optimal separating hyper-plane is created, namely the maximum margin. The points inboth classes nearest to this margin are called support vectorsand, once selected, they contain all the information necessaryto define the classifier. Every time a new element appears, itcould be classified according to where it places respect to theseparating hyperplane.SVM could also be applied in case of non-linear classifica-tion using a function φ(x) that maps the data onto a high-dimensional feature space, where the linear classification isthen possible. Furthermore, if a kernel function K(xi, xj) =(φT (xi) · φ(xi)) is applied, it is not necessary to evaluateexplicitly φ(xi) in the feature space. Various kernel func-tion could be used, such as linear, polynomial or GaussianRBF. This property enables SVM to be used in case of verylarge feature spaces because the dimension of classified vec-tors does not influence directly the SVM performance.When more than two classes are present, a Multi-class SVMcould be adopted. Two different approaches are taken into ac-count: One-against-all (OAA) and One-against-one (OAO).In the first one the i-th SVM is trained with all the examplesin the j-th class with positive labels and all the other exampleswith negative labels, while in the latter one each classifier istrained on data from two classes.It is clear that in the previous cases, two or more classes ofdata are given since the beginning of the analysis. In moregeneral diagnostic applications, instead, only one type of dataobjects is usually acquired: the healthy one. This could beseen as the detection of patterns in data that do not conform to
3
First European Conference of the Prognostics and Health Management Society, 2012
53
European Conference of the Prognostics and Health Management Society, 2012
a well defined notion of normal behaviour, so we could referto anomaly detection. One-Class SVM is the application ofthe SVM approach to the general concept of anomaly detec-tion, as presented by Schlkopf et al. in Schlkopf, Williamson,Smola, Taylor, and Platt (2000). In their method they con-struct a hyper-plane around the data, such that this is max-imally distant from the origin and can separate the regionsthat contain no data. They propose to use a binary functionthat returns +1 in region containing the data and -1 elsewhere.For a hyper-plane w which separates the data xi from the ori-gin with maximal margin ρ, the following quadratic programhas to be solved:
minw∈F,ξ∈Rn,ρ∈R
1
2||w||2 +
1
νn
∑
i
ξi − ρ (8)
subject to (w · Φ(xi)) ≥ ρ− ξi, ξi ≥ 0 (9)
where ξ represents the slack variable and ν is a variable takingvalues between 0 and 1 that monitors the effect of outliers(hardness and softness of the boundary around data).If w and ρ solve the minimisation problem presented in Eq.(8) - (9), the decision function
f(x) = sign((w · Φ(xi))− ρ) (10)
is positive for most instances representing the majority ofdata.Figure 2 shows graphically the idea presented here, with onlyfew points around the origin that are negatively labelled.
4. METHODOLOGY
The previous sections introduced the background and the the-oretical aspects of the two methods that now we want to usejointly. The goal of this study is the search for a method ableto identify a damage in a rotating element of a roller bearingby removing the effect of external conditions influencing vi-brations.The diagnosis method consists of different steps:
1. Collect vibration signals under various condition ofspeed and radial load applied, both for a healthy and adamaged bearing.
2. Apply EMD and decompose the original signal into someIMFs; then choose the first n to extract the features usedduring the analysis.
3. Evaluate the total energy for the n selected IMFs:
Ej =
∫ +∞
−∞|cj(t)|2dt j = 1, . . . , n. (11)
4. Create a feature vector with the energies of the n selectedIMFs:
F = [E1, . . . , En]. (12)5. Normalise the feature vector dividing F for this value:
EN =
√√√√n∑
j=1
|Ej |2 (13)
(a)
(b)
Figure 3. DIRG test rig (a) and roller bearing used during thetests with the damaged roller in the white circle (b).
6. Obtain the n-dimension normalised feature vector:
F ′ = [E1/EN , . . . , En/EN ]. (14)
7. Consider 75% of healthy data as training and the remain-ing 25% as test together with damaged data. All loadsand speeds are analysed together.
8. Train the one-class SVM classifier on training data andevaluate the label assigned by the classifier to test data.The real class is known so mistakes in labelling could becomputed.
9. Repeat point 7. and 8. 30 times permuting healthy dataorder to give statistical significance to the analysis andevaluate the error percentage in labels assignment.
5. APPLICATION TO BEARING DATA
Several conditions can influence data during acquisitions inour test rig analysis: speed, external load, temperature vari-ations. Detecting and removing the effects of these factorsis important to avoid any bias during the application of diag-nostic techniques. In fact, a small variation in speed or in thetemperature of the oil circulating in a system produces devi-ations that a diagnostic algorithm may erroneously detect asa damage, thus providing a false alarm. In this paper we tryto introduce a method able to identify a damage in a rotatingelement of a roller bearing by removing the effect of speedand external load.Accelerations are acquired on a test rig assembled by Dy-namics & Identification Research Group (DIRG) at Depart-ment of Mechanical and Aerospace Engineering (Figure 3 a).
4
First European Conference of the Prognostics and Health Management Society, 2012
54
European Conference of the Prognostics and Health Management Society, 2012
Figure 4. RMS value.
This bearing test rig is designed to perform accurate testingof bearings different levels of damage in a controlled labo-ratory conditions, especially regarding the minimization ofspurious signals coming from the mechanical sounds of otherbearings, rotating shafts, gear wheels meshing and other vi-brating elements. Hence, we are sure that the only variationsin accelerations are given by speed and load that can be prop-erly changed and monitored.We consider three different speed values (9000, 10500 and12000 RPM) and three radial loads (1.4, 1.6 and 1.8× 103 N)and we acquire data for each combination. In particular, 10acquisitions registering 1 second of vibrating signal at sam-pling frequency 102.4 kHz are collected for each of the ninecases. This is done both for a healthy bearing and for a dam-aged one. In the last case, we analyse a bearing with a greaterthan 450 µm fault on a rolling element (Figure 3 b). Noticethat the temperature of the oil circulating is almost constantbetween the different acquisitions, so we are certain that theonly variations detected through vibrations are caused by loadand speed changing.In Figure 4 Root Mean Square values for the 10 acquisitionsin each condition are evaluated. This plot shows how this pa-rameter is influenced by the speed both for healthy and dam-aged case and it increases with higher speeds. Moreover, itcan be noticed that in low speed cases this parameter valuefor damaged bearing is almost near to the healthy one whenit reaches the highest speed. For example, RMS value for adamaged bearing at 9000 RPM for the three loads is around30. If we consider the healthy case at the highest speed eval-uated (12000 RPM) RMS is around 34, so it can be noticedthat the undamaged bearing at higher speed has a parametervalue greater than the faulty one at lower speed. It means thatif we consider the RMS parameter taking into account all nineconditions together, the difference between healty and faulty
Figure 5. Error percentage for linear kernel.
Figure 6. Error percentage for polynomial kernel.
Figure 7. Error percentage for gaussian kernel.
5
First European Conference of the Prognostics and Health Management Society, 2012
55
European Conference of the Prognostics and Health Management Society, 2012
Figure 8. 2-dimension feature vector F ′ representation.
bearings may be strongly biased. This observation leads tothe need of a parameter that could avoid such kind of prob-lems.According to the methodology presented in Section 4, we ob-tain a normalised feature vector F ′. We decide to take into ac-count the first 8 IMFs which include the most dominant faultinformation, so this vector is in a 8-dimensions space. Theanalysis through OCSVM is done starting from the first twodimensions of the feature vector. Then we add a new dimen-sion each time until the whole feature vector F ′ is used. Wechoose to include the feature from the beginning according tothe fact that EMD operates in form of collection of filters or-ganised in a filter bank structure. In particular, the first modecould be considered similar to a highpass filter while the othermodes are characterised by a set of overlapping bandpass fil-ters (Flandrin & Rilling, 2004). In such way, taking the fea-ture starting from the beginning of the vector, we move fromhigher frequency contents to lower ones.As stated in Section 4, the 75% of healthy data are used totrain the classifier, while the 25% of them are added to dam-aged data as testing instances. Since the exact belonging isknown, it is interesting to evaluate the errors in labelling madeby the OCSVM classifier. In this way, an evaluation of the re-lation between the number of dimensions and a proper iden-tification procedure could be done. Moreover, three differentSVM kernels are compared through the application to the ac-quired data:
• linear: K(xi, xj) = (xTi xj)d
• polynomial: K(xi, xj) = (xTi xj + 1)d
• Gaussian: K(xi, xj) = exp(−γ||xi − xj ||2)
For each kernel, parameters d and γ take values going from 1to 4 and labelling mistakes are evaluated in percentage. Fig-ures 5, 6 and 7 present the different behaviours of the three
Figure 9. 2-dimension feature vector F ′ representation afterOCSVM.
Figure 10. 2-dimension feature vector F ′ representation withdifferent conditions: the first number is the speed expressedin RPM, the second is the load expressed in kN.
kernels when the number of feature and the parameters val-ues increase. The error percentage for the linear kernel tendsto decrease when the dimensions go from 1 to 8. Hence, inorder to provide a good detection ability a greater number offeatures should be considered. The same behaviour is ob-served for polynomial kernel when d = 1, while for the othervalues of the parameter less errors are present for 2, 6, 7 and8 dimensions. The error trend in the case of a gaussian ker-nel does not seem to be conditioned by parameter γ, whilethe minimum number of labelling errors are found when thefeature vector has 2 and 7 dimensions. On the whole, a gaus-sian kernel or a polynomial one with parameter d > 1 givesuccessful results in detecting the damage regardless of speed
6
First European Conference of the Prognostics and Health Management Society, 2012
56
European Conference of the Prognostics and Health Management Society, 2012
Figure 11. Normalised feature vector F ′ values for threespeeds for both undamaged and damaged case.
and load influence.To emphasise this fact, we can concentrate on the 2-dimensions feature vector F ′ thanks to the fact that it givesinteresting results and because it is easier to visualise. Ifwe consider the totality of 180 values computed using ourmethodology for both healthy and damaged bearing, we ob-tain the plot in Figure 8. In this picture, it is clear how datadivide into two groups according to their state rather then de-pending on their condition of load and speed. This explainsthe great efficiency of the classifier in damage identification,due to the perfect distinction between the two classes of data.It could be seen in Figure 9 how OCSVM with Gaussian ker-nel and γ = 1 works. The testing data are well classified(green triangles) and only one belonging to the faulty class islabelled as healthy producing an error (red cross).Furthermore, any dependence on different loads and speedsseems to be removed as pointed out in Figure 10. The ninesymbols represent the various conditions for the undamagedand damaged bearing and, on the whole, no particular divi-sion based on the rotational speed or on the load applied isnoticed.Figure 11 could help to explain the ability of the method inthe speed and load influence removal. Values of one acquis-tion feature vector F ′ are plotted for each of the speeds con-sidered, both for the healthy and for the damaged bearing.Firstly, the vector normalisation presented at step 5 and 6 inthe Methodology Section helps to remove the contribution ofhighest energies and, so, to mitigate the various conditionsinfluence on the features. Moreover, as it could be noticedin the Figure, this aspect is particular observable for the ’fre-quency content’ represented by c2. The normalised values ofthe energies here, in fact, tends to be very similar indepen-dently of the speed considered, giving a great contribution in
the removal of this parameter influence.
6. CONCLUSION
In this paper we proposed a method for the detection of dam-ages in roller bearings with the removal of speed and loaddependence. This methodology combines Empirical ModeDecomposition, used to produce a proper feature vector, withthe One-Class Support Vector Machine technique, exploitedto classify the data. Since the original class belonging wasknown, different SVM-kernels have been tested in order tofind those with lower error rate. Encouraging results havebeen obtained related to the ability of this feature in removingspeed and load dependence in order to avoid any bias in datainterpretation and identification. Further applications coulddeal with various damage entity comparisons and with otherdamage type, such as sandblasted inner ring. Moreover, otherfactors influence removal, such as temperature, and the com-parison of this method with other techniques used to obtainthe feature vector, such as wavelet decomposition, could bedeveloped.
REFERENCES
Antoni, J. (2006). The spectral kurtosis: a useful tool forcharacterising non-stationary signals. Mechanical Sys-tem and Signal Processing, 20, 282307.
Bartelmus, W., & Zimroz, R. (2009). Vibration conditionmonitoring of planetary gearbox under varying externalload. Mechanical Systems and Signal Processing, 23,246257.
Chebil, J., Noel, G., Mesbah, M., & Deriche, M. (2009).Wavelet decomposition for the detection and diagnosisof fault in rolling element bearings. Jordan Journal ofMechanical and Industrial Engineering, 3, 260-267.
Cocconcelli, M., & Rubini. (2011). Support Vector Machinesfor condition monitoring of bearings in a varying-speedmachinery. In Proceeding International Conference onCondition Monitoring, Cardiff, UK.
Cocconcelli, M., Rubini, R., Zimroz, R., & Bartelmus, W.(2011). Diagnostics of ball bearings in varying-speedmotors by means of Artificial Neural Networks. In Pro-ceeding International Conference on Condition Moni-toring, Cardiff, UK.
Flandrin, P., & Rilling, G. (2004). Empirical Mode Decom-position as a filter bank. IEEE Signal Processing Let-ters, 11(2), 112-114.
Gao, Q., Duan, C., Fan, H., & Meng, Q. (2008). Rotatingmachine fault diagnosis using empirical mode decom-position. Mechanical System and Signal Processing,22, 1072-1081.
Huang, N. E., & Shen, S. (Eds.). (2005). Hilbert-HuangTransform and Its Applications. World Scientific, Sin-gapore.
Huang, N. E., Shen, Z., Long, S. R., Wu, M. C., Shih, H. H.,
7
First European Conference of the Prognostics and Health Management Society, 2012
57
European Conference of the Prognostics and Health Management Society, 2012
Zheng, Q., et al. (1998). The empirical mode decompo-sition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. In Proceedings of theRoyal Society (p. 903-995).
Junsheng, C., Deije, Y., & Yu, Y. (2006). A fault diagno-sis approach for roller bearings based on EMD methodanda AR model. Mechanical System and Signal Pro-cessing, 20, 350-362.
Machorro-Lopez, J., Bellino, A., Garibaldi, L., & Adams, D.(2011). PCA-based techniques for detecting crackedrotating shafts including the effects of temperature vari-ations. In Proceeding 6th International Conferenceon Acoustical and Vibratory Surveillance Methods andDiagnostic Techniques, Compigne, France.
Pirra, M., Gandino, E., Torri, A., Garibaldi, L., & Machorro-Lopez, J. M. (2011). PCA algorithm for detection,localisation and evolution of damages in gearbox bear-ings. Journal of Physics. Conference series, 305(1).
Randall, R. B., & Antoni, J. (2011). Rolling element bearingdiagnostics - A tutorial. Mechanical System and SignalProcessing, 25, 485-520.
Rojas, A., & Nandi, A. B. (2006). Practical scheme for fastdetection and classification of rolling-element bearingfaults using support vector machines. Mechanical Sys-
tems and Signal Processing, 20, 1523-1536.Schlkopf, B., Williamson, R. C., Smola, A. J., Taylor, J. S., &
Platt, J. C. (2000). Support vector method for noveltydetection. Advances in Neural Information ProcessingSystems, 12, 582-586.
Shin, H. J., Eom, D.-H., & Kim, S.-S. (2005). One-classsupport vector machines - an application in machinefault detection and classification. Computer & Indus-trial Engineering, 48, 395-408.
Vapnik, V. N. (Ed.). (1982). Estimation of dependences basedon empirical data. Springer-Verlag, New York.
Widodo, A., & Yang, B. (2006). Support vector machinein machine condition monitoring and fault diagnosis.Mechanical Systems and Signal Processing, 21, 2560-2574.
Worden, K., Staszewski, W. J., & Hensman, J. J. (2011).Natural computing for mechanical systems research: Atutorial overview. Mechanical Systems and Signal Pro-cessing, 25, 4-111.
Yu, Y., Deije, Y., & Junsheng, C. (2006). A roller bearingfault diagnosis method based on EMD energy entropyand ANN. Journal of Sound and Vibration, 294, 269-277.
8
First European Conference of the Prognostics and Health Management Society, 2012
58
Data Management Backbone for Embedded and PC-based Systems
Using OSA-CBM and OSA-EAI
Andreas Löhr1, Conor Haines
2, and Matthias Buderath
3
1,2Linova Software GmbH, Garching b. München, 85748, Germany
3 Cassidian, Manching, 85077, Germany
ABSTRACT
Cassidian is in the process of developing a comprehensive
simulation framework for integrated system health
monitoring and management research and development.
One significant building block is to invite 1st class
technology providers, e.g. Universities and SMIs, to provide
innovative technologies and support their integration into
the simulation framework. This paper is a joint presentation
of Cassidian and Linova Software GmbH, a Cassidian
preferred software provider.
Prognostic Health Management (PHM) systems are
commonly composed of disparate and distributed hard- and
software components. Further, these components exchange
vast amounts of data over a heterogeneous collection of
communication channels. Any such system’s success
depends upon an open, uniform, and performance-optimized
solution for data management. A solution that includes: data
definition, data communication, and data storage. The Open
System Architecture for Condition-based Maintenance
(OSA-CBM) and Open System Architecture for Enterprise
Application Integration (OSA-EAI) are complementary
reference architectures and represent an emerging standard
for application domain-independent asset and condition data
management. Herein, we will report on our experiences
while implementing a data management backbone based on
OSA-CBM and OSA-EAI for a simulation environment
supporting PHM systems in the aerospace domain. Our
work encompasses both airborne embedded systems and
ground-based PC systems. While we can generally confirm
the feasibility of OSA-CBM and OSA-EAI, we found
several implementation recommendations unsuited to real-
time operating conditions. To address these issues, we
propose work towards standardizing non-XML-based
transportation formats for OSA-CBM data packets. Further,
we discovered issues specific to implementing the OSA-EAI
data model in the aerospace domain. These issues drove our
proposal to extend the OSA-EAI database model, where we
seek to optimize its usability for analytical tasks. To
underline the feasibility of our solutions, we provide
empirical evidence drawn from our work. The conclusion is
a summary of our experience and the direction of future
work in the area of PHM system design for aircraft
maintenance. In total, our contribution to the community is
best seen from a practitioner’s perspective. We aim to
establish best practices for and contribute to the evolution of
OSA-CBM and OSA-EAI.
1. SIMULATION ENVIRONMENT
The aerospace industry is a core application domain and
development driver for PHM systems. The paradigm shift
towards predictive maintenance which PHM systems
impose to maintenance and overhaul processes promises
higher aircraft availability coupled with lower overall
maintenance costs. As in any other domain, challenges in
introducing PHM systems to the aerospace domain are
twofold. On the one hand, there are individual challenges in
developing sensor technology, state detection, and health
assessment methodologies/models for determining the
future life span of a (possibly deteriorated) component. On
the other hand, there are distinct challenges when
integrating heterogeneous data from disparate and
distributed sources into consolidated information and
dependable decision support. This applies at both the
aircraft and fleet level. It has therefore been recognized in
the community that standardized and open data management
solutions are crucial to the success of PHM. Such a standard
should introduce a commonly accepted framework for data
representation, data communication, and data storage.
_____________________
Andreas Loehr et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License,
which permits unrestricted use, distribution, and reproduction in any
medium, provided the original author and source are credited.
First European Conference of the Prognostics and Health Management Society, 2012
59
European Conference of Prognostics and Health Management Society 2012
2
EADS Deutschland GmbH, Cassidian, is developing a
comprehensive simulation framework for research in the
areas of condition monitoring and prognostic health
management. The framework includes airborne functions
hosted on embedded systems, as well as ground-based
functions hosted on PC-based systems. The primary
objective is to interconnect both airborne and ground-based
systems using a uniform data management philosophy and,
as far as possible, uniform communication protocols. In this
paper, we report on experience from our task to define and
implement the data management backbone for such a
simulation framework. The backbone is based on the Open
System Architecture for Condition-based Maintenance
(OSA-CBM) and the Open System Architecture for
Enterprise Application Integration (OSA-EAI).
1.1. OSA-CBM
The OSA-CBM reference architecture has become the de
facto standard for exchanging data in a condition monitoring
system. Being an implementation of the ISO-13374
functional specification, the architecture defines six
functional layers. Each layer is allocated different and
unique functions of the data processing chain in a condition
monitoring system.(see Figure 1).
Figure 1.OSA-CBM Reference Architecture
This architecture focuses on the definition and
communication of data. Specifically, on the question as to
which data entities and events can be exchanged between
the layers during operation and the communication
interfaces used for this purpose. The format by which the
data is exchanged between the layers remains unspecified;
however, the usage of XML messages, which are
transported over HTTP, is recommended. For this purpose,
the standard provides a thorough collection of specifications
for XML messages.
1.2. OSA-EAI
The reference architecture OSA-EAI is complementary to
OSA-CBM. It specifies a comprehensive data storage
architecture for asset management systems. This
architecture consists of: a physical relational data model
(Common Relational Information Schema, CRIS), a
corresponding logical object model (Common Conceptual
Object Model), and CRUD interfaces (Create, Retrieve,
Update, Delete) for all defined entities in the data model, as
depicted in Figure 2. In the course of harmonizing OSA-
EAI with OSA-CBM, the data model defines entities that
are capable of storing data originating from all six OSA-
CBM layers. Analogously to OSA-CBM, it is recommended
that clients interact with an OSA-EAI database via XML
messages transported via HTTP. For this purpose, the
authors of the OSA-EAI standard provide a multitude of
CRUD XML message specifications. These specifications
define how to manage data contained in the database and
how to make the data available to any other stakeholder or
application within a PHM system.
Figure 2. OSA-EAI Reference Architecture
A link to the MIMOSA organization, which maintains the
reference architectures, can be found in the references
section.
2. SIMULATION ENVIRONMENT
The simulation environment consists of an air segment and a
ground segment, (inter-)connected by a data management
backbone that relies on OSA-CBM and OSA-EAI. In the
following section, we introduce the high level architecture
of our simulation framework.
2.1. Air Segment
The air segment of the simulation framework models those
systems and associated sensors for which we intend to
develop IVHM capabilities. At the core of the framework is
a central IVHM data processor. Sensors push their data to
this IVHM data processor via an OSA-CBM compliant
implementation. As a reflection of the working
environment, the underlying message protocol is optimized
for embedded systems (detailed in section 3). The IVHM
data processor calculates IVHM information according to
the OSA-CBM layer specifications, up to the health
assessment layer (refer to Figure 3).
Figure 3. Air Segment of Simulation Framework
First European Conference of the Prognostics and Health Management Society, 2012
60
European Conference of Prognostics and Health Management Society 2012
3
2.2. Ground Segment
The central data processor supports the downloading of
data, which has been collected and calculated on board the
aircraft, to the ground-based environment for further
processing (e.g. during the aircraft’s turnaround). Once
downloaded, the data is stored in a central data management
component, which we call the CBM data warehouse (refer
to Figure 4).
Figure 4. CBM Data Warehouse
The CBM data warehouse is based on the OSA-CBM/OSA-
EAI reference architectures and it serves two major
purposes: first, it hosts all current (i.e. short timeframe) and
historical (i.e. long timeframe) condition data. Second, it
provides services to distributed client applications that are
involved in the PHM process. Such services include the
CRUD interfaces as defined by OSA-EAI (e.g. for asset
configuration management), high layer functions as defined
by OSA-CBM (prognostic assessment and advisory
generation), and other services relevant for a PHM system.
In our context, data management includes the entire data set
life cycle: from initial instantiation of a sensor value,
transportation to the IVHM data processor, downloading to
the ground-based environment, on through to storage and
further processing. In section 3 we discuss aspects of OSA-
CBM-based data management in an embedded system.
Section 4 derives from experience gained while realizing the
CBM data warehouse.
3. OSA-CBM IN AN EMBEDDED SYSTEM
Following an initial implementation of OSA-CBM using
XML messages transported via HTTP/TCP, we decided to
use binary messages transported via a UDP/IP stack. This
significant departure from the MIMOSA recommendations
was driven by requirements that arose from our intended use
of OSA-CBM in the context of embedded systems certified
for in-flight usage. Our focus of interest for on-board
implementation ranges from data acquisition layer up to
health assessment and the following sections report about
our experience in implementing these classes using the C
programming language.
3.1. Environment
When fielding OSA-CBM compliant applications on
embedded systems certified for in-flight usage, several
issues are brought to the fore. Ultimately, two aspects
defined the unique structure of our solution: resource
limitation and non-dynamism. Computing hardware for
avionics, due to qualification requirements, are generations
behind present off the shelf computing hardware.
Implementation rules for applications hosted on real-time
operating systems (such as VxWorks) typically forbid
dynamically allocating memory resources, as these
operations are potentially non-deterministic and lead to
memory leaks if not used carefully. This environment
imposes further constraints on the solution space: due to
qualification or certification requirements (depending on the
risk class of the final system) all embedded code must be
written in the C programming language. Furthermore, UDP
must be used as the sole protocol for network
communication.
3.2. Use Case and Design Considerations
We want to transmit a heavy load data event set which
contains four heterogeneous OSA-CBM DMDataSeq
events at individual sample rates of 160Hz, 360Hz and 1
kHz. Additionally, we want to transmit a light load data
event set, containing a single DMDataSeq event recorded
at 20Hz; both data event sets will be transmitted with a
frequency of 1Hz.
Generating OSA-CBM compliant XML representing our
two event sets and packaging the XML into UDP packages
as ASCII code was a straight-forward implementation
approach as it has been performed by others (Swearingen,
Kajkowski, Bruggeman, Gilbertson &Dunsdon, 2007).
Generally, it involves the following three steps:
1. Sender: assemble the XML from an internal data
representation in memory
2. Sender: marshal the XML into a UDP package and
send
3. Receiver: Unmarshal and parse the received XML
and populate an internal data representation in
memory
As we will show later on, in Table 1, using XML generates
a structure in which 75% of the transmitted data is
apportioned to meta-data defining the XML structure.
Additionally, due to its absolute size, the heavy load data
event set exceeds the maximum size of a UDP packet.
While it would have been possible to split up its data into
several UDP packages, we consider the ratio between meta-
data and payload to be unsuited to the constrained allocation
of computing resources. We acknowledge that if we assume
our heavy and light load data event sets would be the only
loads on the communication channel (e.g., ethernet), there is
First European Conference of the Prognostics and Health Management Society, 2012
61
European Conference of Prognostics and Health Management Society 2012
4
no risk that it will exceed transmission capacity; but this
assumption may not hold in a real aircraft design where
communication is channeled and, due to the availability of
qualified or certified hardware, the transmission capacity
might be drastically limited. We also researched XML
parsers that are written in C, and therefore compile for
embedded environments, (e.g., Mini-XML, Expat, RXP) but
we found them incompatible with internal programming
policies (static memory allocation). Additionally, the high
risk involved in the certification or qualification of an XML
parser for an embedded system finally drove our decision
towards a non-XML-based binary solution for marshalling
and unmarshalling OSA-CBM data.
3.3. Design and Implementation
OSA-CBM is an object-oriented specification and therefore
makes use of polymorphism, which is the ability to create
object attributes, object functions or even an entire object
that has more than one form. Our implementation of OSA-
CBM is based upon the representation of OSA-CBM classes
by a set of C structures. The C programming language is
procedural and does not offer native polymorphism. After
analyzing data manipulation through health assessment
layer communication classes of the OSA-CBM object
model, we concluded that a mapping of OSA-CBM classes
to C structures is possible. We will next explain our
rationale in supporting this approach.
The C programming language decouples data from
functionality, therefore we did not have to map
polymorphism of functions (OSA-CBM does not define
behavior of the classes, anyway). We also could not identify
polymorphism of attributes for the classes of our interest.
However, there is polymorphism of objects, i.e., specific
derived classes inherit part of their structure from one or
more base classes. We mapped this kind of polymorphism
by initially modeling C structures for each root class (i.e.,
classes that do not have a base class in the OSA-CBM
model). For all non-root classes we modeled a member in
the derived class which is of the type of the respective super
class. As an example, the structure for the data sequence
event of the DM layer (DMDataSeq) is shown in Figure
5(c). The corresponding base class structures are shown in
parts (b) and (a), respectively.
Within specific limits our approach is also able to emulate
multiple inheritance by including more than one base class
member; however, the part of the OSA-CBM data model
that we focused on does not involve multiple inheritance.
For transmission, multiple data event instances are bound
together into a data event set. Regarding a single instance of
an OSA-CBM base class, its actual subtype at runtime can
be anything. This is critical to the C implementation as the
DataEventSet class acts as a transportation container for
any DataEvent instances. We solved this problem by
introducing a constraint: an OSA-CBM data event set may
only include data events of the same type. This allowed us
to introduce a non-standard member on the
DataEventSet class which is of enumerated type
OsacbmDataType and which indicates the type of
included events.
Figure 5. Exemplary Payload OSA-CBM Structures
The received byte stream can therefore be interpreted
correctly on the receiver side. For the transmission itself, we
copy a structure’s memory image into a temporary buffer.
Additionally, as required by the event type, the buffer
memory is appended with a data block for each reference
from a structure’s pointer members (here: values and
xAxisDeltas). Finally, the buffer is sent as a UDP
packet to the receiver, where is reconstructed into a set of
OSA-CBM compliant data. Consequently, we support both
static data types (such as DMReal) and dynamic types (such
as DMDataSeq). Though, as a necessary overhead,
complex data sequences require recipient side remapping of
pointers at run time and a maximum payload size must be
defined for real time operation.
3.4. Evaluation
Quantitative evaluation will be accomplished here with a
comparison between the data required for an ASCII XML
data transmission versus that of our custom binary
transmission protocol. We used Ubuntu 10.0.4 (32bit) as
sender and VxWorks on Power PC (32bit) bit as recipient.
Table 1 outlines the data characteristics of two
representative communication samples.
Figure 6. Data Event Set as C Structure
First European Conference of the Prognostics and Health Management Society, 2012
62
European Conference of Prognostics and Health Management Society 2012
5
The first sample is a heavy load data event set. It contains
four heterogeneous OSA-CBM DMDataSeq events at
individual sample rates of 160Hz, 360Hz and 1 kHz. The
overall data event set has a frequency of 1Hz. The resulting
data push represents 2,520 individual measurements being
sent across the system every second. The second sample is a
light load data event set, containing a single DMDataSeq
event recorded at 20Hz; the corresponding overall data
event set has a frequency of 1Hz.
XML Binary Ratio
Heavy Load 165 345 bytes 40 792 bytes 4.1
Light Load 1 827 bytes 576 bytes 3.2
Table 1. Data Transmission Size Comparison
As seen in Table 1, there is a significant reduction in the
volume of data transmissions achieved by our approach,
ranging up to a factor of four. An additional effect of our
approach, as compared to sending XML messages via UDP
instead of via HTPP/TCP (Swearingen, Kajkowski,
Bruggeman, Gilbertson & Dunsdon, 2007) is a significant
reduction in the processing overhead required by XML
structural parsing; this reduction is beyond the scope of our
present analysis.
However, there are drawbacks of our approach. As UDP is a
stateless protocol, there is a cap on the amount of data that
can be transmitted per event set. It is limited to the
maximum allowed size of a UDP Data package (UDP
specifies a maximum allowed size). Depending on platform
specific settings the maximum available size can be
significantly less.
We believe that this size limitation is best addressed by
splitting the data set into a series of discrete packets, as
opposed to introducing additional limitations and overheads
on the binary transmission format. Data management within
a closed on-board real-time environment a priori requires
that the overall data communication is well designed
regarding timing and loads. In such a closed and well
controlled environment the likelihood of UDP packet loss is
minimized, however, it may happen. Therefore, we propose
the usage of UDP-based transmission only for functions
which can cope with temporary gaps in their data input,
such as our diagnostics algorithm. For functions which are
not robust to data losses, a confirmation and resend protocol
could be invented, but that would negate the usage of UDP
and TCP would be the transmission protocol of choice.
Our current implementation is highly platform dependent as
it is patched to meet the characteristics of our environment
(sender 32bit Ubuntu, recipient 32bit VxWorks). To
overcome platform differences we introduced artificial
padding bytes (see C structure members in Figure 5) so that
the internal in-memory arrangement is equal on both
platforms and performed byte-swapping on the receiving
platform. This allowed us to easily case the UDP package
payload into the required structures (including pointer
remapping).
Finally, XML messages can be read by humans more easily
than binary messages. This may impose complications to
the debugging cycles during software development;
however, from our experience, software developers tend to
develop the ability to “read” binary content over time, in
particular if sophisticated Hex editor tools are being used. A
steeper learning curve certainly is worth the performance
gains. As for the generation of test data for certification or
qualification, binary protocols do not impose significant
overhead, as also with XML a generative approach will
have to be used to deal with the large amount of test cases.
3.5. Outlook
Our initial implementation, transmitting the memory image
of structures, is not optimal when communication must take
place between heterogeneous platforms and only allows for
a homogenous data event set payload. Yet, it yields
significant performance gains, reduces the consumption of
memory, and simplifies certification or qualification. As
shown above, issues related to padding and regarding the
arrangement of data in RAM may arise. While these issues
can be mitigated if the characteristics of the platforms are
known, the scalability in general remains limited. To
address these issues, we started the development of a
custom binary OSA-CBM protocol. The vision was to
evolve this protocol as a generic and platform-independent
means for transporting OSA-CBM events over the network
in a binary fashion. In Figure 7 we provide an excerpt from
our initial work to illustrate the proposed design approach.
Based on preliminary low level definitions (such as big or
little endian, widths of primitive data types) all OSA-CBM
classes are modeled as a sequence of 16 Bit words. In our
example, an ID consists of two words, i.e. it represents a
32bit integer value.
Analogously, the OsacbmTime class is represented as a
sequence of five words (our customized implementation
only required the time_type and time_binary
attribute). With every class having such a specific
representation, data events and entire heterogeneous data
event sets can be assembled. For dynamic structures, upper
bounds for the allowed amount of dynamic data must be
defined (possibly implementation specific) in order to meet
the requirements of real-time operating systems. To avoid
sending spare data, the binary representation of such
dynamic portions requires that one includes a member that
defines the actually allocated amount of data (up to a
maximum dictated by the data size allowed in a UDP
packet). An example is the member
DMDataSeq.dataSize, which is not part of the OSA-
CBM specification but which is required for correctly
interpreting the words. Checksums to detect transmission
failures were foreseen as well. By standardizing the binary
First European Conference of the Prognostics and Health Management Society, 2012
63
European Conference of Prognostics and Health Management Society 2012
6
representation for the network format, senders as well as
recipients have to translate between their platform specific
representation and the network format. Although there is
marshalling and un-marshalling to be done, we hypothesize
that the CPU load for this process can be neglected
compared to XML parsing.
Figure 7. Exemplary binary representation of DataEventSet
Based upon results shown in the previous section, the size
of data structures in this new network format will be in the
area of 25% of a corresponding XML representation.
3.6. Binary Message Format in OSA-CBM 3.3.1
The most recent version of OSA-CBM, Version 3.3.1,
includes a specification for a binary transmission format for
OSA-CBM messages. We see our work confirmed by this
addition to the OSA-CBM standard. Following an initial
design and trade study, we decided to adopt MIMOSA’s
specification as the network layer format amongst our
subsystems. Though this choice rendered our custom
protocol design work moot, it is implementation that has
been and remains the focus of our work. Furthermore, the
compatibility of our systems with the rest of the community
will be ensured by following a standard which is now part
of that community. That is to say, our optimizations in the
marshalling/un-marshalling of data within and amongst real
time embedded systems and in the creation of an API/library
for OSA-CBM transmission is just as critical while using
the MIMOSA standard as with our custom message format.
Our aim is to create a fully C coded, statically allocated
implementation of the OSA-CBM Binary message
specification for embedded systems.
4. CBM DATA WAREHOUSE
The ground segment of our simulation framework includes a
central repository for data and information, called the CBM
data warehouse.
4.1. High Level Requirements
Design of the CBM data warehouse was driven by the
following high-level requirements.
1. The CBM data warehouse shall act as a central
information system for all applications involved in
the PHM process.
2. The CBM data warehouse shall provide a uniform
and standardized interface for managing and
querying its data.
3. The CBM data warehouse shall maintain full
traceability for any in-service data item regarding
origin, allocation (to assets, aircraft and flights) and
changes.
Given the need to meet these requirements across a large
fleet of aircraft, the design of the CBM data warehouse
faces two core challenges. First, it must process a large
number of transactions originating from daily maintenance
tasks, such as asset installation/removal and storing newly
available IVHM-data from performed flights. Second, it
must process and store a large amount of historical data for
performing diagnostics and prognostics, as well as their
continual improvement as more in-service data becomes
available.
4.2. Realization
The OSA-EAI and OSA-CBM reference architectures
define a uniform data management philosophy that allows
for full traceability of virtually any sensor value and its
derived information. Earlier work (Gorinevsky, Smotrich,
Mah, Srivastava, Keller & Felke, 2010, and others)
demonstrated the feasibility of using these architectures as a
reference to build a comprehensive information system and
associated service interface across multiple domains,
including aerospace. We consequently considered the
selection of OSA-EAI and OSA-CBM as guidelines for the
design of our CBM data warehouse as a promising approach
to satisfy our high level requirements.
4.2.1. Scope
We have implemented a subset of the OSA-EAI standard for
our initial version of the CBM data warehouse. The subset
was derived with the aim of providing data management for
diagnostics and prognostics on our candidate systems.
Confirming reports from other researchers, we found the
documentation of OSA-EAI to be rather sparse, especially
when mapping its generic universe of entities to a specific
application domain. We concentrated on the ability to
First European Conference of the Prognostics and Health Management Society, 2012
64
European Conference of Prognostics and Health Management Society 2012
7
express system breakdowns (Assets, Segments, and
Parent/Child relations) and the ability to associate data from
the data acquisition, data manipulation, and state detection
layers. Additionally, each asset was to have an active history
of health assessments and remaining useful life estimates.
We expected that this would lead to an implementation of
tables exclusively from the REG, DIAG, DYN and TREND
groups of entities; however, with the exception of the
TRACK group, we had to implement at least one table from
all other entity groups in order to satisfy mandatory
connections between tables. We consider this a symptom of
the complexity of the OSA-EAI standard, and strongly
encourage the maintainers of the standard to establish a
sample or reference application for OSA-EAI (and OSA-
CBM), similar to the SCOTT database example of Oracle.
4.2.2. Customization
We customized the remaining OSA-EAI tables in a way that
would simplify the generation of test and reference data, but
still allow for the drawing of general conclusions (congruent
customization) from our experience. We made further
customizations to map specific features of the aerospace
domain (domain customizations). Many tables of OSA-EAI
have a composite primary key (i.e. 2 or more columns) due
to the fact that the database model is designed for data
exchange or integration amongst different database
instances. For this purpose OSA-EAI introduces the Site
concept, which uniquely identifies the stakeholder of a
specific dataset. In combination with the dataset ID, any
dataset can thus be uniquely identified. Since our simulation
framework is currently a closed system, the maintainer
remains constant. Therefore, we stripped the composite
primary keys of each entity down to a single dataset id,
allowing us to strip down foreign keys as well. This
approach was shown to be feasible by Mathew, Zhang,
Zhang and Ma Lin (2006).
We further recognized that OSA-EAI does not have the
specific notion of a flight, or a mission. This was not
unexpected, as OSA-EAI is generic; however, analyses in
the aerospace domain are often flight/mission centered. Per
definition, OSA-EAI measurements can only be related to
assets/agents and time. Additions were necessary to relate
measurements with a specific flight/mission entity under
which they occurred. These updates allow the system to
couple flight/mission characteristics and degradation. While
OSA-EAI foresees enough meta-data to perform a
chronological mapping to an external flight/mission
database, our experience from other projects shows that a
direct mapping of information to a flight (or at least a power
cycle) is inevitable.
In the aerospace domain, segments represent virtual
“placeholders” for assets and these placeholders have
unique logistic control numbers. Such features can be
represented by OSA-EAI using the attributive tables for
each segment (Segment Numeric Data or Segment
Character Data). However, being modeled as an explicit
attribute of a segment, the evaluation of logistic control
numbers is more efficient. We recognize that one could
come up with many such contra arguments, as OSA-EAI is
a domain independent and generic standard.
4.2.3. Performance Considerations
Coping with a large number of transactions and handling
large volumes of data at the same time, the CBM data
warehouse has both the role of an Online Transaction
Processing (OLTP) system and that of an Online Analytical
Processing (OLAP) system. These two requirements seem
to contradict each other at first glance.
The database model of an OLTP system is normalized, that
is, it consists of many interconnected tables and each table
describes a fine granular bit of the application domain. The
number of tables that contain redundant information
(possibly in different representations) is minimized so that
the risk of a transaction leaving the database in an
inconsistent state is low. Due to its appearance from a bird’s
eye view, a normalized schema is referred to as a snowflake
schema. For an OLTP system, normalization is a
prerequisite, as it supports CRUD operations with optimal
performance and data integrity. The downside of a
snowflake schema is that information retrieval and analysis
result in complex queries involving many tables, which
results in bad performance.
The database model of an OLAP system is de-normalized,
which means that it consists of few tables, which contain
redundant information for the sake of reduced query
complexity and minimal join operations. Due to its
appearance from a bird’s eye view, a de-normalized OLAP
schema is referred to as a star schema. Snowflake and star
schema are depicted in Figure 8. The information of interest
is marked as grey boxes. The OSA-EAI database model in
its current state is heavily normalized and therefore clearly
OLTP-centered. Others have confirmed this statement using
formal methods (Mathew and Ma, 2007). Although we
could confirm specific issues regarding modeling and
documentation (Mathew et al., 2006), we still consider
OSA-EAI as well defined for transactional tasks. In contrast
to criticism that has been raised by industry, we consider the
normalization of OSA-EAI as essential, whereas Mathew
and Ma (2007) argue that the normalized character of OSA-
EAI is one of its weaknesses.
Applying standard modeling techniques to selected subsets
of interconnected OSA-EAI tables, they propose OLAP-
centered alterations for OSA-EAI according to star schema
design. These show that, at least for selected subsets of
coherent CRIS tables (so called data marts), the OLAP-
centered model holds equivalent information. Not
surprisingly, Mathew and Ma (2007) acknowledged that
their redesign optimizes analytics, but has significant
First European Conference of the Prognostics and Health Management Society, 2012
65
European Conference of Prognostics and Health Management Society 2012
8
drawbacks for transactional use. They conclude with a
discussion of their motivation for further work towards a
compromise.
Figure 8. Snowflake (OLTP) vs. Star Schema (OLAP)
We argue that such a compromise cannot manifest as a
single data model that features characteristics from both
OLTP and OLAP-centered models. Such an approach would
fit neither side. Instead, motivated from our findings during
the realization of the CBM data warehouse and the
experience from our other projects that deal with large data
volumes (which go beyond the scope of this document), we
propose an extension to OSA-EAI to specifically support
analytical tasks on large volumes of historical data.
4.3. “Common Relational Analytics Schema”
The characteristics of OLTP and OLAP are too distinct to
be merged into a single database model. The database model
that is defined by OSA-EAI is called Common Relational
Information Schema (CRIS). Instead of redesigning CRIS to
include OLAP-specific features, we propose a new
standardized database model named Common Relational
Analytics Schema (CRAS). Our proposed database model
lives under the umbrella of OSA-EAI and coexists with
CRIS. Since an OLAP-centered database is primarily
designed for reading (not writing), the CRAS portion of
OSA-EAI will be populated on a regular basis from the
content stored in the CRIS portion. Both portions hold an
equivalent informational content – however, CRIS is
optimized for transactional purposes while CRAS is
optimized for analytical purposes.
4.3.1. Motivation
For a PHM system, it is necessary that prognosis be
performed in a short timeframe, e.g. during the turnaround
phase of an aircraft. However, this is different from actually
performing analytics. At least the prognostics algorithms
that we were utilizing require neither the entirety of all
recorded historical data, nor any preprocessed results
requiring filtering or aggregation (which are typical tasks of
OLAP systems). A limited set of data, say from the last N
flights, was sufficient. We found that with the standard
CRIS queries these limited historical datasets could be
retrieved reasonably fast. We draw this conclusion from our
direct experience with the tools we created. Our sample
database did not contain fleet condition data from several
aircraft over several years. And with such huge amounts of
data the performance will degrade. We hypothesize,
however, that using table partitioning techniques, which
have become available with today’s relational database
management system (such as Oracle’s Enterprise Edition), it
is possible to set an upper limit for the amount of data that
has to be searched by a query to identify the prognostics raw
data from the last N flights. An apparent partition key is
time, but Site is also a promising candidate.
We further suggest that analysis tasks that would require an
OLAP-centered database model be conducted on a regular
basis, but decoupled from the daily operational (i.e.
transactional) business. We claim that it is therefore suitable
to populate the CRAS on demand (e.g. once a month) in
order to perform retrospective analyses (e.g. for the
continuous improvement of diagnosis and prognosis).
4.3.2. Architecture
A high level overview of our proposed architectural
extensions of OSA-EAI is given in Figure 9. The elements
drawn in grey represent the current state of the art of OSA-
EAI. The OLTP-centered database model, CRIS, stores the
operational data in a relational database (the corresponding
object model has been omitted). Furthermore, the OSA-EAI
standard defines a comprehensive service interface for
accessing and modifying the operational data. We propose
to extend OSA-EAI according to the following three aspects
(corresponding to the black-marked items in Figure 9):
1. Database model that is optimized for analytical
purposes (OLAP), which is able to store a
congruent informational content as CRIS. We call
this database model the Common Relational
Analytics Schema (CRAS). It is organized
according to the star schema approach.
2. A standardized interface for issuing
multidimensional queries against CRAS.
3. Standardized Extraction, Transformation and
Loading (ETL) process populating tables in the
CRAS schema with operational data from CRIS.
4.3.3. Performance and Operational Considerations
Our work regarding CRAS suggests an a priori hybrid
approach for database modeling. We are currently refining
the concept and have just begun prototype implementations.
Therefore, we cannot yet provide empirical results; in
particular, when it comes to handling data volumes in the
magnitude of terrabytes. For these volumes, the concept has
yet to be proven. While the idea of CRAS as a complement
to CRIS is clearly new, the methodology that it is based on,
i.e., the star schema, has been available for years and is well
understood. The star schema yields excellent performance
results even with large data volumes. We have gained
First European Conference of the Prognostics and Health Management Society, 2012
66
European Conference of Prognostics and Health Management Society 2012
9
empirical knowledge from another work area which requires
queries that involve both filters and aggregation. Results
indicate a boost, due to the star schema approach, in the
magnitude of 10 to 100 with respect to response time when
handling millions of data sets.
Figure 9. CRAS Extension of OSA-EAI (shown in black)
with an optional data model which is optimized for analytics
To ensure scalability for the joint operation of CRIS and
CRAS, we propose the following methodology. It is known
that the performance of both the CRIS and CRAS schemas
degrade with a growing amount of data. However, we
believe the CRIS schema will degrade faster than the CRAS
schema. Once a fresh system has been set up, the CRIS
portion will be constantly populated with new data, and, in
reasonably short intervals, the CRAS schema will be
constantly recreated from the current data in CRIS by the
ETL process. The CRAS schema is stateless at this phase, as
it can always be recreated from CRIS. Operational tasks will
be carried out in the CRIS, while analytical tasks run on the
CRAS. Provided that suitable hardware segmentation is
available (e.g., dedicated CPUs, dedicated RAID volumes)
operations on both schemas should not influence each other.
Once specific hot spots of the CRIS schema have degraded
to a stage where performance is no longer acceptable, old
data must be archived in the CRAS schema. We assume that
one can define data as being old simply by its date of
creation or other criteria. We further assume that such old
data will not be altered due to operational processes; which
certainly applies to sensor data. Therefore the ETL can
move (instead of just transform) old data to CRAS where it
will then permanently reside – just not in the CRIS form.
Since there is no need to alter the old data, it can be
removed from CRIS completely, mitigating the performance
degradation. However, the old data is still available for
analysis in CRAS. From this point on, the CRAS schema
becomes stateful, as it cannot be entirely recreated from
CRIS.
From a high level point of view, the CRIS schema’s data
volume will grow up to a specific limit and then shrink
again, so there is a worst case performance for operational
tasks. In contrast, the CRAS schema will constantly grow
with each new archival process. However, the growth will
take place in a database schema that is designed for
performance and large volumes; nevertheless, without
suitable measures the CRAS cannot grow indefinitely.
There are scaling measures to ensure performance of
database schemas in general that can be applied to our
situation. For data archived in CRAS which still needs to be
considered during online analyses, so called partitions
should be maintained. A partition influences the way a
database physically stores a database table on the storage
device but keeps this storage strategy transparent to the
application (programmer). Partitions can be created during
maintenance phases of the PHM system. Depending on
specific criteria of the data set, such as the date of creation
(the so called partition key) it will be assigned to one
partition or the other. Partitions can be assigned a separate
storage device, i.e., one disk for each partition. Therefore,
even specific tables can be scaled independently from
others. While the further discussion goes beyond the scope
of this writing, the effect is that the search space for queries
can be significantly reduced. Operational data that the ETL
transforms from CRIS will have its own partition(s),
whereas all archived data will have separate partitions. We
believe therefore that the effects of a growing CRAS on the
continuous ETL transformation of operational data can be
mitigated. However, if the amount of data in CRAS
significantly degrades the online analysis performance, one
has to consider moving the oldest data from CRAS into
offline storage. Here, we assume that this data no longer
contributes to an operational PHM (e.g., data from assets
that have been moved out of service) and can be analyzed
offline (or e.g., in a separate database).
4.3.4. Challenges and Future Work
There are two core challenges involved in our work. First,
the concept of joint operations between CRIS and CRAS
needs to be proven. We have to derive enough sample data
and set a representative database configuration and
environment to prove our claim. In its current stage, this
approach is merely a concept. While the methodologies and
technology it is built upon have proven to be feasible in
other domains, the risk of not being able to implement it as
proposed is non-negligible. In the previous section we
mention the introduction of offline storage for the oldest
data in the system. We want to point out here a new aspect
of performance research for OSA-EAI by combining it with
Hadoop, an emerging technology for distributed storage and
query of huge volumes of data. Second will be the
derivation of a generic CRAS schema that fits the needs of
analytical tasks for PHM in a domain-independent manner.
This must be accomplished while maintaining the same
First European Conference of the Prognostics and Health Management Society, 2012
67
European Conference of Prognostics and Health Management Society 2012
10
level of quality as CRIS does in fitting the needs of
transactional usage in a generic way. Mathew et al. (2007)
have applied a formal process for attempting to derive an
initial OLAP-centered database model from CRIS. They
identified so called data marts (fact tables and
corresponding dimensional tables) for the areas of
configuration data, measurements, health and alarms, events
and work management. However, they give no reason as to
why no data mart for remaining useful life was identified.
As such, the actual details of the generic ETL process are
left open for future work.
5. CONCLUSION
We presented our experience from the realization of a data
management backbone for a simulation framework for PHM
systems in the aerospace domain. For the airborne segment
OSA-CBM-based communication was chosen. We
encountered issues relating to the recommended
transportation protocol for OSA-CBM when implementing
the standard under the conditions of a real-time operating
system. From our findings, we are motivated to use a binary
transportation format for OSA-CBM data events that
address embedded systems. This standard is to be both
binary and lean. In the process, we hope to avoid the
inherent overhead in processing power and memory
consumption of an XML-based transportation over HTTP.
Our preliminary results are promising. They amount of raw
data to represent specific OSA-CBM messages could be
reduced to 25% of the XML-based size (overhead for HTTP
and TCP not included). As our approach lacks platform
independence we outline a path for future work towards a
platform-independent binary representation for OSA-CBM
messages. The ground-based part of our data management
backbone is centered on an information system, which we
call the CBM data warehouse. It is designed according to
the OSA-EAI reference architecture. Confirming the
feasibility of OSA-EAI in conjunction with OSA-CBM, we
encountered minor issues in mapping aerospace domain
concepts to the generic entities and could confirm issues
reported by others. To answer the necessity of a PHM
system to perform both transactional and analytical
interaction with the CBM data warehouse, we recommend
extensions to OSA-EAI. We propose an optional and
complementary database model called CRAS (in analogy to
CRIS) that is optimized for analytical queries and follows
OLAP principles. It coexists with CRIS and is populated, on
demand, by CRIS transactional data. We close by pressing
for future work in this area in the form of field studies.
REFERENCES
Gorinevsky, D., Smotrich, A., Mah, R., Srivastava A.,
Keller, K., &Felke, T. (2010). Open Architecture for
Integrated Vehicle Health Management. AAIA
Infotech@Aerospace Conference, April20-22
Mathew, A. D., &Ma, L. (2007). Multidimensional schemas
for engineering asset management. Proceedings World
Congress on Engineering Asset Management,
Harrogate, England
Mathew, A. D., Zhang, L., Zhang, S., & Ma Lin (2006).A
review of the MIMOSA OSA-EAI database for
condition monitoring systems. Proceedings World
Congress on Engineering Asset Management, Gold
Coast, Australia
MIMOSA. Mimosa Organization Website.
http://www.mimosa.org
Swearingen, K., Kajkowski, W., Bruggeman, B., Gilbertson,
D., &Dunsdon, J. (2007). Multidimensional schemas
for engineering asset management. Proceedings IEEE
Aerospace Conference
BIOGRAPHIES
Matthias Buderath Aeronautical Engineer with more than
25 years of experience in structural design, system
engineering and product- and service support. Main
expertise and competence is related to system integrity
management, service solution architecture and integrated
system health monitoring and management. Today he is
head of technology development at Cassidian. He is member
of international Working Groups covering Through Life
Cycle Management, Integrated System Health Management
and Structural Health Management. He has published more
than 50 papers in the field of Structural Health
Management, Integrated Health Monitoring and
Management, Structural Integrity Programme Management
and Maintenance- and Fleet Information Management
Systems.
Conor Haines received his B.Sc. degree in Aerospace
Engineering from Virginia Polytechnic Institute and State
University in 2003 and his M.Sc. degree in Computational
Science from the Technical University of Munich in 2011.
For 3 years Conor was a test engineer supporting the NASA
Near Earth Network, providing simulation support used to
guide system development. At his current post, he is
focused on developing IVHM and Computer Vision
technologies as a Software Engineer for Linova Software
GmbH.
Andreas Löhr received his M.Sc. degree in Computer
Science from the Technical University of Munich in 2001
(Informatics, Diplom) and earned his PhD degree in
Computer Science from Technical University of Munich in
2006. For 6 years he worked as a software engineer at
Inmedius Europa GmbH in the area of interactive technical
publications and researched in the field of wearable
computing. He founded Linova Software GmbH in 2008
and at his current post as managing director he focuses on
development of maintenance information systems and data
management architectures.
First European Conference of the Prognostics and Health Management Society, 2012
68
Designing Data-Driven Battery Prognostic Approaches for
Variable Loading Profiles: Some Lessons Learned
Abhinav Saxena1, José R. Celaya
2, Indranil Roychoudhury
3, Sankalita Saha
4, Bhaskar Saha
5, and Kai Goebel
6
1, 2, 3 Stinger Ghaffarian Technologies Inc., NASA Ames Research Center, CA, 94035, USA
4,5Mission Critical Technologies Inc., NASA Ames Research Center, CA, 94035, USA
6NASA Ames Research Center, CA, 94035, USA
ABSTRACT
Among various approaches for implementing prognostic
algorithms data-driven algorithms are popular in the
industry due to their intuitive nature and relatively fast
developmental cycle. However, no matter how easy it may
seem, there are several pitfalls that one must watch out for
while developing a data-driven prognostic algorithm. One
such pitfall is the uncertainty inherent in the system. At each
processing step uncertainties get compounded and can grow
beyond control in predictions if not carefully managed
during the various steps of the algorithms. This paper
presents analysis from our preliminary development of data-
driven algorithm for predicting end of discharge of Li-ion
batteries using constant load experiment data and challenges
faced when applying these algorithms to randomized
variable loading profile as is the case in realistic
applications. Lessons learned during the development phase
are presented.
1. INTRODUCTION
The field of prognostics is steadily maturing as an important
field under health management as newer algorithms are
constantly being developed. Among the two main categories
are data-driven and model-based algorithms with competing
advantages and limitations (Schwabacher, 2005). This paper
summarizes our experience from implementing a data-
driven approach for a variable load discharge scenario for
Lithium-ion (Li-ion) batteries using experimental data
collected in controlled lab environment.
An intuitive observation-based approach was initially
implemented, which required considerable improvements as
we learned about various shortcomings during the
development process. In this paper we present our lessons
learned from the exercise, as well as an analysis of various
pitfalls that may be encountered in developing data-driven
methods that may seem intuitive and relatively
straightforward in the beginning but may not match up on
expectations when actually implemented. The paper also
presents a detailed description of our data-driven algorithm.
Corresponding results are also compared with a model
based algorithm using an empirical degradation model.
1.1. Motivation
The motivation for this works stems primarily from two
sources. First, it is of growing interest to develop
prognostic health management solutions for Li-ion batteries
as the use of power storage technologies is gaining
momentum in energy intensive industries. While several
efforts have focused on relevant topics, an accurate way of
estimating battery capacity during realistic load profiles
with variable and/or random operational loading still
deserves attention. This paper describes the results of our
efforts towards developing a generic data-driven approach
for developing prognostic algorithms for randomized
variable loading scenarios. It is generally assumed that data-
driven methods typically require large amounts of training
data in the initial development phase, but wherever possible,
allow a much rapid, easy to implement, and computationally
inexpensive developments compared to model-based
approaches. This however, comes at a cost of a significant
data processing effort upfront and still does not guarantee a
successful implementation. More often than not it calls for
_____________________ Abhinav Saxena et al. This is an open-access article distributed under
the terms of the Creative Commons Attribution 3.0 United States
License, which permits unrestricted use, distribution, and reproduction
in any medium, provided the original author and source are credited.
First European Conference of the Prognostics and Health Management Society, 2012
69
European Conference of Prognostics and Health Management Society 2012
2
re-evaluation of the initial hypothesis and may require
significant changes adding to complexity as problems
become more realistic. In this effort we exemplify a process
where data-driven algorithms that were once perfected for
constant loading profiles do not guarantee good
performance when tried on variable loading case and
requires rethinking of the strategy, which is in contrast to an
empirical model-based approach where the original
implementation still performs well. Contrary to our initial
beliefs that for systems like Li-ion batteries, where the
characteristics of charge-discharge processes show similar
qualitative trends, data-driven methods can be adapted fairly
quickly once a data processing methodology is in place, we
found that there are significant challenges in developing a
robust data-driven method.
The second source of motivation comes from our continuing
efforts towards facilitating a standardized platform for
comparison of various prognostic approaches. Assessing
algorithmic performance and drawing comparisons against
baselines is one of the enablers towards verification and
validation. As the field of prognostics matures as a research
area, it is important to create an infrastructure that facilitates
verification and validation activities towards certification of
prognostic health management systems. This has been
somewhat difficult because until recently there were no
standard methods to evaluate different algorithms in a
comparable manner due to lack of benchmark datasets or
performance metrics useful for prognostics. An extensive
survey of health management applications and other related
domains revealed that conventional metrics, borrowed ad
hoc from diagnostic domains, had been reused, which did
not serve as well (Saxena et al., 2008). Therefore, a set of
prognostic performance metrics were developed with the
perspective of using prognostic information in health
management and decision making processes (Saxena,
Celaya, Saha, Saha, & Goebel, 2010). However, this process
could be further streamlined with the availability of
benchmark run-to-failure datasets that can be used for
prognostic algorithm development. With that intent several
accelerated aging testbeds were designed and developed at
NASA Ames Research Center and data were made available
to the PHM research community to take advantage of
through prognostics data repository (NASA, 2007). These
datasets have been downloaded more than 20,000 times
from all over the world and used for algorithm development
in the last four years. One of the popular datasets (over 6000
downloads) includes Li-Ion battery aging data that contain a
variety of operational conditions with several sensor
measurement data collected in-situ (B. Saha & K. Goebel,
2011). Despite a large number of downloads we were
unable to find more than just a few references reporting
successful prognostic implementation on battery data
(Orchard, Silva, & Tang, 2011; Orchard, Tang, &
Vachtsevanos, 2011). In this paper we report results from a
preliminary data-driven approach for a randomized variable
loading case. It is our hope that the community will take up
the problem and find other ways that can then be compared
with the ones reported here as initial baseline performance.
1.2. Paper Organization
The rest of the paper is divided into several sections. Section
2 presents a brief background of various efforts related to
prediction of battery life and battery discharge. Application
domain is described in Section 3, which explains the nature
of experiments conducted, lays out the problem of variable
loading, and presents some observations. Section 4 starts by
describing the overall approach taken and presents details of
feature extraction, learning procedure, and prediction
algorithms. Section 4 concludes with a brief discussion of
underlying learning algorithms that are used in our
prediction framework. Section 5 presents the results and
discussions, followed by conclusions in Section 6. More
details on the results are included in appendix for reader’s
reference.
2. BACKGROUND
Predicting the End-of-Discharge (EoD) times for batteries
has been investigated in the recent years to predict the time
when (a predefined) cut-off threshold voltage is reached and
the power source is no longer available to continue the task
(Bhaskar Saha & Kai Goebel, 2011). Depending on the
application type and availability of data, there are many
other approaches that focus on state-of-charge (SOC)
estimation, current/voltage estimation, capacity and state-of-
health (SOH) estimation. SOC estimation is by far the most
popular approach where charge counting or current
integration is used in different ways to estimate battery
capacity. This approach suffers from various inaccuracies
resulting under realistic usage environments (Meissner &
Richter, 2003). Use of extensive lookup tables relating
open-circuit voltage (OCV) to SOC is popular in the
electronics industry, which requires extensive testing and
data collection to build such mappings (Lee, Kim, & Lee,
2007). For safety critical applications it is important to
determine when the system will lose power, and hence use
of voltage threshold for time to end of charge prediction is
preferred. This implicitly assumes a direct relationship
between available voltage and available charge from the
battery. An example of one such application is described in
(Bhaskar Saha & Kai Goebel, 2011) where EoD time is
predicted for an e-UAV (electric unmanned air vehicle). It is
also illustrated how variable the loading can be during
extreme maneuvers and a time to EoD prediction must
account for expected future loads and environmental
conditions. An EoD time prediction application using an
empirical model based Bayesian approach is discussed in
(Saha, Goebel, Poll, & Christophersen, 2009). Among data-
driven approaches, in (Rufus, Lee, & Thakker, 2008) a
virtual sensor is described based on a data-driven approach
but primarily for SOH estimation and RUL prediction based
First European Conference of the Prognostics and Health Management Society, 2012
70
European Conference of Prognostics and Health Management Society 2012
3
on usage patterns and environmental factors such as
operational temperature. A statistical approach to battery
life prediction that builds parametric models of the battery
from collected data is described in (Jaworski, 1999).
Another data-driven effort extracts and tracks changes in the
internal impedances from voltage characteristics obtained
from battery cycling data (Luo, Lv, Wang, & Liu, 2011). All
changes are attributed to battery aging only, thereby not
considering load and temperature as influencing factors.
Recent years have seen a growing interest in the use of and
machine learning techniques, e.g., Hamming network (Lee,
Kim, Lee, & Cho, 2011), and stochastic filtering techniques
e.g., unscented filter (Santhanagopalan & White, 2010) and
extended Kalman filter (Hu, Youn, & Chung, 2012) to
estimate the state of charge and/or degradation parameter
(e.g., state of capacity) of a Li-ion battery cell under a
randomly varying loading condition. Most of these data-
driven approaches are shown to work on similar data to
what they were trained on. This requires availability of
operational data from real environment, which is not always
the case. In this work we take an alternative approach by
using data-driven models that are developed from a set of
controlled experiments. We investigate whether it is
possible to extract relevant features from current and
voltage measurements collected during battery usage
(discharge cycles) under controlled experiments in various
fixed loading conditions to learn data-driven models that
would then allow us to predict EoD for a variable loading
scenario. Furthermore, estimated capacity values are not
used in making the predictions, since it is generally very
difficult and inaccurate to obtain battery capacities during
operation.
3. APPLICATION DOMAIN
The methods developed in this work are based on aging data
for 18650 Li-ion batteries available from prognostics data
repository hosted by NASA Ames (B. Saha & K. Goebel,
2011). The data used for algorithm development and testing
is generated in a battery testbed described in (Saha &
Goebel, 2009). This testbed allows charging and discharging
of batteries and collecting relevant information to estimate
the state of the battery. In-situ measurement of battery
current, voltage and temperature are available and these are
used for development (training) of data-driven algorithms.
Several charge/discharge cycles are typically applied to a set
of batteries. These batteries were charged to 4.2 volts using
an initial constant current (CC) profile of 1.5A until 4.2V is
reached, followed by a constant voltage (CV) mode until
current drops to 10mA. Since the main objective of these
algorithms is to estimate EoD, a subset of batteries is
discharged at constant current during discharge cycles; with
current levels of 1A, 2A and 4A between different batteries.
Figure 1 shows representative discharge profiles for the
training cases discharged at three different constant current
values.
Figure 1. Constant load discharge profiles at 1, 2 and 4 A currents.
Batteries are considered fully discharged (100% depth of
discharge) when they have reached 2.7V. The higher the
discharge current, the less time it takes for the battery to
discharge. The increased voltage drop off rate towards the
end of the discharge cycle is typical for this type of
batteries. This is very relevant to the algorithm development
since it presents a challenge in implementing typical
regression-based data-driven methods when dealing with the
steep non-linearity towards the end of the discharge cycle.
While discharge profiles under fixed load conditions were
used for algorithm training, variable loading cases (to
represent realistic profiles) were generated for algorithm
validation. In the variable load discharge profile, the current
is varied randomly between 1A and 4A levels. The variable
load case provides additional challenges to the EoD time
estimation algorithm. It can be observed from Figure 2 that
each time the load changes from one discrete value to
another, there is a transient in the battery voltage value. In
addition, the time of steep drop in the voltage towards the
end of discharge is uncertain as it changes every time the
load current changes and not just with the state of voltage of
the battery.
Figure 2. Variable load discharge profile between load current
levels of 1A and 4A.
Training Profiles – 1A, 2A, 4A
0 100 200 300 400 500 600 700 8002
2.5
3
3.5
4
4.5
Discharge Time (sec)
Voltage
1 Amp
2 Amp
4 Amp
Test Profiles – random loads of 1A, 2A, 4A
0 200 400 600 800 1000 1200 14002
3
4
Seconds
Voltage
Battery: B0062val
DischargeCycle No.: 2
0 200 400 600 800 1000 1200 14000
1
2
3
4
0 200 400 600 800 1000 1200 14002
3
4
Seconds
Voltage
Battery: B0062val
DischargeCycle No.: 2
0 200 400 600 800 1000 1200 14000
1
2
3
4
Vo
lta
ge
Lo
ad
Cu
rre
nt
Training Profiles – 1A, 2A, 4A
0 100 200 300 400 500 600 700 8002
2.5
3
3.5
4
4.5
Discharge Time (sec)
Voltage
1 Amp
2 Amp
4 Amp
Test Profiles – random loads of 1A, 2A, 4A
0 200 400 600 800 1000 1200 14002
3
4
Seconds
Voltage
Battery: B0062val
DischargeCycle No.: 2
0 200 400 600 800 1000 1200 14000
1
2
3
4
0 200 400 600 800 1000 1200 14002
3
4
Seconds
Voltage
Battery: B0062val
DischargeCycle No.: 2
0 200 400 600 800 1000 1200 14000
1
2
3
4
Vo
lta
ge
Lo
ad
Cu
rre
nt
First European Conference of the Prognostics and Health Management Society, 2012
71
European Conference of Prognostics and Health Management Society 2012
4
Battery performance degradation due to operational usage
also affects EoD time estimation for a particular usage
cycle. For instance, Figure 3 shows several discharge
profiles for a battery used under constant discharge loading.
It can be observed that the amount of time it takes for the
battery to discharge to the 2.7V threshold is reduced
considerably for latter cycles during the battery life. In
addition, the rate of voltage decay in the pseudo-linear
region also changes with battery age. Finally, the knee
point, signaling the beginning of the exponential voltage
decay region towards the end of discharge cycle, also
changes in location and it becomes more difficult to identify
due to reduced curvature as battery ages. These changes in
voltage profile characteristics form the basis of feature
extraction as described in the next section.
Figure 3. Constant load discharge profiles from different stages of
battery life from a single battery.
4. PROGNOSTIC APPROACH
In this section, we present our approach to predict the end of
discharge (EoD) time of the battery, denoted as . This is
the time at which the battery voltage reduces to 2.7V. The
aim here is to predict the for different discharge runs of
the battery, given (i) an incomplete discharge cycle data
until current time, and (ii) the complete (randomly
changing) future operating loading. It should be noted that
for this phase of algorithm development we assume a
perfect knowledge of future load profile for the current
discharge event. Furthermore, no partial charge and
discharge events are included in these scenarios, therefore a
charge cycle initiates only after the battery is fully
discharged. These assumptions will be relaxed in the next
phase of development as we learn more about these batteries
first in these simplistic scenarios.
4.1. Feature Extraction and Training
Recall that even though our eventual goal was to predict the
for battery discharge cycles under random loading
conditions, we train our prognostic approach using battery
discharge cycle data collected under constant loading
conditions of 1A, 2A, and 4A. As a first step, training data
were prepared by carrying out denoising of the constant
loading cycle data. Some incomplete and corrupted runs
were also removed from the training data. Once the
denoised battery discharge cycles are obtained, we observe
that the voltage versus time plots (see Figure 4) for different
discharge cycles have the same trend, and each voltage
discharge plot consists of three different and distinct
regions. The first two regions can be approximated by linear
trends followed by a third region with a sharp drop-off
curve. The first pseudo-linear region is due to instant drop in
voltage due to internal battery impedance on application of
load current. For simplicity, this impedance is approximated
by an aggregated internal resistance, which is estimated as
the ratio of the observed voltage drop and the applied load
current. It is understood that as battery degrades the internal
resistance of the battery increases, and hence an estimate of
this internal resistance can be used as a proxy for battery
SOH. This estimate of internal resistance is used in creating
the maps of how the load and SOH affect voltage profiles.
Figure 4. Illustration of features extracted from training data.
The second pseudo-linear region spans majority of the
discharge profile where voltage available across battery
terminals goes down proportionally as battery charge is
depleted. It can be observed from Figure 3 that as the
battery ages the slope of this pseudo-linear region changes
and the corresponding voltage drop is higher for a given
amount of discharge time. Furthermore, this slope is also
affected by the load current level. Hence, a mapping ( ) is
created that relates this slope (m) to changes in battery
health ( ) and load current ( ). This second
pseudo-linear region is followed by a sharp drop-off in the
voltage. We term this point as the knee point, and denote
, the knee point time to be the time at which the
discharge curve enters into steep voltage drop region. For
this work, we simplified to be the time at which the
discharge curve has a predetermined slope value, It was observed that, generally, at , the battery has
consumed approximately 90% of its available charge. The
identification of is crucial in prediction , since it is
0 500 1000 1500 2000 2500 3000 35002.5
3
3.5
4
4.5
Time (Sec)
Measure
d D
ischarg
e V
oltage (
V)
cycle #:2
cycle #:46
cycle #:126
cycle #:202
cycle #:278
cycle #:356
cycle #:433
cycle #:509
cycle #:588
0 500 1000 1500 2000 2500 3000 35002.6
2.8
3
3.2
3.4
3.6
3.8
4
4.2
Time (Sec)
Measure
d D
ischarg
e V
oltage (
V)
cycle #:2
cycle #:588tadd
ΔVPseudo Linear Region 1Initial Voltage Drop
Pseudo Linear Region 2: y = mx +c
Knee Point Locus
Voltage Drop off Region
Discharge Cut off Voltage 2.7V
Vthreshold
Voltage at Knee Point
Decreasing Capacity
tEODTime (Seconds)
Me
as
ure
d D
isc
ha
rge
Vo
lta
ge
tknee
mknee
First European Conference of the Prognostics and Health Management Society, 2012
72
European Conference of Prognostics and Health Management Society 2012
5
used in determining the slope of the second pseudo-linear
region.
Given the trend of the voltage discharge plots, over the set
of all denoised battery discharge cycles under different
constant loading conditions, the following features are
extracted in order to compute the :
1. The battery SOH, which is approximated by internal
resistance, and is estimated by computing the
ratio of voltage drop and the change in load current,
, as observed in the first pseudo-linear
region of the voltage discharge plot (see Figure 4). is
the change in current when the battery is loaded and
is the corresponding voltage drop in battery
terminal voltage. is also used in proportionally
adjusting the voltage level whenever load is switched
from one value to another.
2. The slope, , of the second pseudo-linear region of the
voltage discharge plot.
3. The knee point time, , beyond which a battery is
observed to retain only about 10% of its total capacity
for a given SOH. This feature is based on empirical
observation and is found to be consistent across all
cycles at all SOH. For computational purposes this
point is identified by the time at which a corresponding
threshold voltage is reached.
4. , the additional time taken corresponding to
remaining 10% capacity discharge, which needs to be
added to in predicting ; therefore, . This allows us not to model the non-
linear behavior explicitly and just adjusting the
estimates by additive offsets computed from the
mapping .
It is observed that each of the above features depends on the
state of health of the battery and the load level.
characterizes the internal impedance of the battery and
represents battery age. Hence, we use as an
approximation for SOH. It is assumed that SOH does not
change within a given discharge cycle. Hence, given the
load, , and , we learn the following
three multidimensional mappings:
These mappings can be implemented using several different
techniques. In this work, we focus learning these mappings
using the least-squared polynomial regression, and artificial
neural network. Once these mappings are learned, the
can be predicted by using the future load profile
information.
4.2. Architecture
The data-driven prognostic approach adopted in this paper is
presented in Figure 5. The first step to this approach is the
estimation of the SOH of the battery. In our approach, we
estimate by estimating the of the battery
using voltage and current measurements, and ,
respectively, at the start of the discharge cycle. and
the future operating loading profile of the battery are then
fed into three mappings, which estimate , , and ,
which are then used for predicting the
Figure 5. Data-driven prognostics architecture.
4.3. Prediction
Recall that although mappings are created using constant
load profile data to learn various relationships, the algorithm
performance is evaluated using data from random loading
profiles. Given discharge cycle data until current time, i.e.,
the time stamped current and voltage measurements
recorded from the battery, and the knowledge of expected
(randomly changing) loading profile in future, our goal is to
make correct prediction for . Since, the training data do
not contain the information about transients that arise during
load switching, an adaptation parameter is incorporated
into the prediction scheme, which gets adapted based on
observed data and is used to adjust the values obtained from
the mappings. This allows us not having to update the entire
mappings that were built offline in training phase but still
incorporate the differences that are seen in run-time data due
to various factors not considered in the learning step.
Algorithm 1 describes our steps for predicting the for a
discharge cycle.
The algorithm takes as inputs the vector of prediction
time-points, , the vector of time-intervals, , and
a vector of future current loading values, , each
element of which corresponds to the current loading time-
intervals in . First, we initialize , our slope
adjustment multiplier. Then, we compute as
explained above. Next, for each discharge cycle, we assess
for the given and from the mapping and
extrapolate from the battery voltage measured at prediction
time, to the end of the current load level segment, i.e.,
until the next load level is switched. The threshold voltage
is also computed. If at the end of the loading cycle,
Prediction
Mapping
f1
Mapping
f2
Mapping
f3
Vth
tadd
mEstimate
SOH
Vmeas
Imeas
Future operating
loading profile
SOHest tEoD
First European Conference of the Prognostics and Health Management Society, 2012
73
European Conference of Prognostics and Health Management Society 2012
6
, determine the time at which , compute
based on and , and determine , and stop. Otherwise, from the last segment, determine
the real slope, and adapt the slope adjustment
multiplier to be used for the next load segment, and the
loop is repeated until either all future load segments are
included in the prediction or a knee point is reached and a
final EoD prediction is made.
Algorithm 1: Prediction
Input:
1.
2.
3.
4.
initialize
for
if
time at which
break
else
compute
end
end
4.4. Learning Algorithms
In order to assess the contribution of data-driven learning
step in our prognostic framework we selected two
regression algorithms continuing from previous
benchmarking efforts (Goebel, Saha, & Saxena, 2008; Saha,
Goebel, & Christophersen, 2008). One of them is of very
low complexity based on linear polynomial regression and
the other represents a more sophisticated approach, i.e. an
artificial neural network (ANN). Finally, to compare the
performance a particle filter based algorithm is used, which
uses empirical models and measurement data to predict
battery EoD. These algorithms are briefly described next.
4.4.1. Polynomial Regression
For the purpose of generating the three mappings a simple
linear polynomial mapping based on least-squared
regression was employed to compare with other regression
approaches such as ANNs. As can be seen from the three
learned mappings in Figure 6, there is significant of noise in
data, which makes it difficult to learn clear relationships,
especially in cases where one-to-many relationship exists
between the input combinations and the output.
Figure 6. The three mappings based on polynomial regression.
Gray cross markers show quality of fit (computed data) using test
(measured) data.
Since no obvious reason was available for such behavior,
first order polynomials (linear models) were fit to data based
on empirical observations. The quality of fit, also shown in
Figure 6, supports this choice. Once these mappings were
built they were used to compute features for input
combinations present in test data. It must be noted that for
learning phase the input space has only three discrete values
available for . Since the load in the test scenario is a
0 1 2 3 4 52.8
2.9
3
3.1
3.2
3.3
3.4
3.5
3.6
3.7
Load Current
V
thre
shold
measured
computed
0.08 0.1 0.12 0.14 0.162.8
2.9
3
3.1
3.2
3.3
3.4
3.5
3.6
3.7
SOHest
(Approximated by Rmeas
)
Vth
reshold
measured
computed
0.08 0.1 0.12 0.14 0.16100
200
300
400
500
600
700
800
900
SOHest
(approximated by Rmeas
)
t add
measured
computed
0 1 2 3 4 5100
200
300
400
500
600
700
800
900
Load Current
t add
measured
computed
0 1 2 3 4 5-8
-7
-6
-5
-4
-3
-2
-1
0x 10
-4
Load Current
Slo
pe o
f Lin
ear
Regio
n (
m)
measured
computed
0.08 0.1 0.12 0.14 0.16-8
-7
-6
-5
-4
-3
-2
-1
0x 10
-4
SOHest
(approximated by Rmeas
)
Slo
pe o
f Lin
ear
Regio
n (
m)
measured
computed
estfm SOHBattery ,Load lOperationa1
estadd ft SOHBattery ,Load lOperationa3
estth fV SOHBattery ,Load lOperationa2
First European Conference of the Prognostics and Health Management Society, 2012
74
European Conference of Prognostics and Health Management Society 2012
7
continuous variable, a linear interpolation was used to arrive
at feature value for test loads spanning between the training
loads of 1A, 2A, and 4A.
4.4.2. Artificial Neural Network Based Regression
An alternative approach to constructing the mappings
and was implemented based on artificial neural networks.
The fitting of these functions provides several complications
for the training on the neural network due to the nature of
the training data. The neural network structure was therefore
selected to obtain the simplest mapping, as close as possible
to a plane. This is done to avoid over fitting which is a
challenge imposed by the data. The data was normalized for
the training of all the mappings in order to improve the
performance of the neural network training. This
normalization consisted of subtracting the sample mean and
dividing the data by the sample standard deviation. The
standard Levenberg-Marquardt algorithm was used for the
optimization during the training process. Simple network
structures with single hidden layer with two neurons for
mapping , and a single hidden layer with one neuron was
used for and .
4.4.3. Benchmark Algorithm – Particle Filters
As part of our previous work, we have developed particle
filter-based prognostic approaches for battery health
management (Orchard, Tang, Saha, Goebel, &
Vachtsevanos, 2010; Saha & Goebel, 2009; Bhaskar Saha &
Kai Goebel, 2011) on the same data sets as used in this
work. We use the results obtained from this approach as our
comparison standard, with the hope that our data-driven
methods can perform as well as a Particle Filter-based
approach. A particle filter (PF) (Arulampalam, Maskell,
Gordon, & Clapp, 2002; Gordon, Salmond, & Smith, 1993)
is a sequential Monte Carlo method that approximates the
state probability density function (PDF) using a weighted
set of samples, called particles. The value of each particle
describes a possible system state, and its weight denotes the
likelihood of the observed measurements given this
particle’s value. As more observations are obtained, the
value of each particle in the next time step is predicted by
stochastically moving each particle to a new state using a
non-linear process model describing the evolution in time of
the system under analysis, a measurement model, a set of
available measurements, and an a priori estimate of the state
PDF. Then, the weight of each particle is updated to reflect
the likelihood of that observation given the particle’s new
state. For prognostics, the PF is used to only predict the
future values of particles based on future operating loading
profiles, and not update them for future operating loading
profiles, since future measurements are not available. In this
work, a detailed discharge model of the cells, as described
in (Bhaskar Saha & Kai Goebel, 2011), is used as the
process model for the PF. The model parameters include
double layer capacitance, the charge transfer resistance, the
Warburg impedance, and the electrolyte resistance. The
model was developed by analyzing the way the impedance
parameters change with charge depletion during the
discharge cycle.
5. RESULTS AND DISCUSSIONS
Algorithms described above were tested on data collected
from two batteries that were discharged under randomized
sequence of loads between 1A and 4A levels. For this paper
we present results from two discharge cycles from each of
the batteries chosen from an early stage of life (second and
fourth discharge cycles). The results obtained from all four
cases were similar in characteristics, and only one set is
presented below for conciseness. The rest of the three sets
are included in the appendix for reference. Results are
evaluated based on alpha-lambda prognostic metric as
described in (Saxena et al., 2010).
Figure 7. Alpha-Lambda metric plot for comparing algorithmic
performance.
Table 1. Prediction results comparing data-driven prediction
approach based on two different learning algorithms and an
empirical model based prediction.
tP RUL*
Particle
Filter
ANN
Regression
Polynomial
Regression
RUL Error RUL Error RUL Error
20 2673 2750 77 2126 -547 2606 -67
247 2446 2511 65 1899 -547 2330 -116
475 2218 2287 69 2344 126 2253 35
703 1990 2067 77 2116 126 2972 982
930 1763 1853 90 2568 805 3026 1263
1157 1536 1589 53 1063 -473 897 -639
1385 1308 1365 57 1147 -161 1204 -104
1612 1081 1151 70 1207 126 1116 35
1840 853 993 80 979 126 888 35
2068 625 735 110 464 -161 369 -256
2296 397 505 108 523 126 432 35
First European Conference of the Prognostics and Health Management Society, 2012
75
European Conference of Prognostics and Health Management Society 2012
8
It can be observed from Figure 7 (and numerical data
provided in Table 1) that the data-driven method based on
two different mappings performs in similar fashion, but the
performance is not as good as the model based approach.
Furthermore, given the nature of the data (see Figure 6) the
method based on ANN mapping performs poorer. It can be
explained based on the fact that it has a harder time learning
simple relationship compared to polynomial regression. On
further analysis several potential issues were identified that
may have contributed to the poor performance of this data-
driven approach:
- As evident from Figure 6, feature data from the
measurements are noisy and in the absence of
suitable denoising scheme learning meaningful
relationships may be difficult. Especially in the
case of randomized loading profiles the effect of
noise may be non-linear that may not be captured
by interpolating observations from three constant
loading scenarios
- Constant loading scenarios lack the information
about the effects of transients that are bound to be
present in variable loading case during the times
when load is switched from one level to another.
Such information is crucial for accurate predictions
- Features extraction involves linearization of several
non-linear regions and hence the performance is
sensitive to choices made such as definition of the
knee point, definition of slope m, etc. These
choices are purely observation based and require a
more thorough sensitivity analysis, which requires
considerable effort as part of data-driven solution.
- Quality of mapping learned from data lies at the
heart of data-driven prediction approach; however
there is no direct provision of updating the
mapping as new data comes in. This translates into
a problem especially for a situation where training
data are significantly different than test data and
are missing some important knowledge.
6. CONCLUSIONS
This paper presented the results and lessons learned from
implementing a data-driven prediction approach for variable
loading scenario based on data acquired from controlled lab
environment for constant loading scenarios. It was observed
that such methods may not always lead to good performance
when applied to realistic datasets. While the performance
obtained in this effort is not generalized to all data-driven
methods by any means, the lessons learned are presented for
the research community to avoid potential pitfalls that one
may run into. This effort also establishes a preliminary
baseline for performance on the battery aging datasets
available from NASA’s prognostic dataset repository, which
will help other approaches in comparative evaluation and
successive improvements in performance.
ACKNOWLEDGEMENT
The authors would like to acknowledge the support from
System wide Safety Assurance Technologies (SSAT)
project under NASA’s Aviation Safety program,
Aeronautics Research Mission Directorate (ARMD).
NOMENCLATURE
slope of second linear segment of discharge profile
estimated state of health
measured battery load current
measured internal resistance of battery
threshold voltage at which end of life is reached
time till end of discharge
till till end of discharge from knee point
real slope of second linear segment of discharge
profile
slope adjustment multiplier
vector of prediction time-points
vector of time-intervals
vector of future current loading values
REFERENCES
Arulampalam, S., Maskell, S., Gordon, N. J., & Clapp, T.
(2002). A Tutorial on Particle Filters for On-line
Non-Linear/Non-Gaussian Bayesian Tracking.
IEEE Tran. On Signal Processing, 50(2), 174-188.
Goebel, K., Saha, B., & Saxena, A. (2008). A Comparison
of Three Data-driven Techniques for Prognostics.
Paper presented at the MFPT 2008.
Gordon, N. J., Salmond, D. J., & Smith, A. F. M. (1993).
Novel Approach to Nonlinear Non-Gaussian
Bayesian State Estimation. Paper presented at the
IEE Radar and Signal Processing.
Hu, C., Youn, B. D., & Chung, J. (2012). A Multiscale
Framework with Extended Kalman Filter for
Lithium-Ion Battery SOC and Capacity Estimation.
Applied Energy, 92, 694-704.
Jaworski, R. K. (1999). Statistical Parameters Model for
Predicting Time to Failure of Telecommunications
Batteries. Paper presented at the 21st International
Telecommunications Energy (INTELEC'99).
Lee, S., Kim, J., & Lee, J. (2007). The state and parameter
estimation of an Li-ion battery using a new OCV-
SOC concept. Paper presented at the IEEE Power
Electronics Specialists Conference (PESC'07).
Lee, S., Kim, J., Lee, J., & Cho, B. H. (2011).
Discrimination of Li-ion batteries based on
Hamming network using discharging-charging
voltage pattern recognition for improved state-of-
charge estimation. Journal of Power Sources,
196(4), 2227-2240.
Luo, W., Lv, C., Wang, L., & Liu, C. (2011). Study on
Impedance Model of Li-ion Battery. Paper
First European Conference of the Prognostics and Health Management Society, 2012
76
European Conference of Prognostics and Health Management Society 2012
9
presented at the 6th IEEE Conference on Industrial
Electronics and Applications, Beijing.
Meissner, E., & Richter, G. ( 2003). Battery Monitoring and
Electrical Energy Management Precondition for
Future Vehicle Electric Power Systems. Journal of
Power Sources, 116(1-2), 19.
NASA. (2007). Prognostics Data Repository. from NASA
Ames Research Center
http://ti.arc.nasa.gov/project/prognostic-data-
repository
Orchard, M., Silva, J., & Tang, L. (2011, September 25th-
29th). A Probabilistic Approach for Online Model-
based Estimation of SOH/SOC and use profile
characterization for Li-Ion Batteries. Paper
presented at the Battery Management Workshop,
Annual Conference of the Prognostics and Health
Management Society 2011, Montreal, QB, Canada.
Orchard, M., Tang, L., Saha, B., Goebel, K., &
Vachtsevanos, G. (2010). Risk-Sensitive Particle-
Filtering-based Prognosis Framework for
Estimation of Remaining Useful Life in Energy
Storage Devices. Studies in Informatics and
Control, 19(3), 209-218.
Orchard, M., Tang, L., & Vachtsevanos, G. (2011,
September 25th-29th ). A Combined Anomaly
Detection and Failure Prognosis Approach for
Estimation of Remaining Useful Life in Energy
Storage Devices. Paper presented at the Annual
Conference of the Prognostics and Health
Management Society 2011, Montreal, QB, Canada.
.
Rufus, F., Lee, S., & Thakker, A. (2008). Health Monitoring
Algorithms for Space Application Batteries. Paper
presented at the International Conference on
Prognostics and Health Management, Denver, CO.
Saha, B., & Goebel, K. (2009). Modeling Li-ion Battery
Capacity Depletion in a Particle Filtering
Framework. Paper presented at the Annual
Conference of the PHM Society, San Diego, CA.
Saha, B., & Goebel, K. (2011). Battery Data Set. from
NASA Ames, Moffett Field, CA
http://ti.arc.nasa.gov/project/prognostic-data-
repository
Saha, B., & Goebel, K. (2011). Model Adaptation for
Prognostics in a Particle Filtering Framework.
International Journal of Prognostics and Health
Management, 2(1), 10.
Saha, B., Goebel, K., & Christophersen, J. (2008).
Comparison of Prognostic Algorithms for
Estimating Remaining Useful Life of Batteries.
Transactions of the Royal UK Institute on
Measurement & Control(special issue on
Intelligent Fault Diagnosis & Prognosis for
Engineering Systems), 293-308.
Saha, B., Goebel, K., Poll, S., & Christophersen, J. (2009).
Prognostics Methods for Battery Health
Monitoring Using a Bayesian Framework. IEEE
Transactions on Instrumentation and
Measurement, 58(2), 291-296.
Santhanagopalan, S., & White, R. E. (2010). State of charge
estimation using an unscented filter for high power
lithium ion cells. International Journal of Energy
Research, 34(2), 152-163.
Saxena, A., Celaya, J., Balaban, E., Goebel, K., Saha, B.,
Saha, S., & Schwabacher, M. (2008). Metrics for
Evaluating Performance of Prognostics
Techniques. Paper presented at the 1st International
Conference on Prognostics and Health
Management (PHM08), Denver CO.
Saxena, A., Celaya, J., Saha, B., Saha, S., & Goebel, K.
(2010). Metrics for Offline Evaluation of
Prognostic Performance. International Journal of
Prognostics and Health Management, 1(1), 20.
Schwabacher, M. (2005). A Survey of Data Driven
Prognostics. Paper presented at the AIAA
Infotech@Aerospace Conference, Arlington, VA.
BIOGRAPHIES
Abhinav Saxena is a Research Scientist with SGT Inc. at
the Prognostics Center of Excellence NASA Ames Research
Center, Moffett Field CA. His research focus lies in
developing and evaluating prognostic algorithms for
engineering systems using soft computing techniques. He is
a PhD in Electrical and Computer Engineering from
Georgia Institute of Technology, Atlanta. He earned his
B.Tech in 2001 from Indian Institute of Technology (IIT)
Delhi, and Masters Degree in 2003 from Georgia Tech.
Abhinav has been a GM manufacturing scholar and is also a
member of IEEE, AAAI and ASME.
José R. Celaya is a research scientist with SGT Inc. at the
Prognostics Center of Excellence, NASA Ames Research
Center. He received a Ph.D. degree in Decision Sciences
and Engineering Systems in 2008, a M. E. degree in
Operations Research and Statistics in 2008, a M. S. degree
in Electrical Engineering in 2003, all from Rensselaer
Polytechnic Institute, Troy New York; and a B. S. in
Cybernetics Engineering in 2001 from CETYS University,
México.
Indranil Roychoudhury received the B.E. (Hons.) degree
in Electrical and Electronics Engineering from Birla
Institute of Technology and Science, Pilani, Rajasthan,
India in 2004, and the M.S. and Ph.D. degrees in
Computer Science from Vanderbilt University, Nashville,
Tennessee, USA, in 2006 and 2009, respectively. Since
August 2009, he has been with SGT, Inc., at NASA
Ames Research Center as a Computer Scientist. His
research interests include hybrid systems modeling, model-
based diagnostics and prognostics, distributed diagnostics
First European Conference of the Prognostics and Health Management Society, 2012
77
European Conference of Prognostics and Health Management Society 2012
10
and prognostics, and Bayesian diagnostics of complex
physical systems. He is a member of IEEE.
Sankalita Saha was a research scientist with Mission
Critical Technologies at the Prognostics Center of
Excellence, NASA Ames Research Center during this effort.
She received the M.S. and PhD. degrees in Electrical
Engineering from University of Maryland, College Park in
2007. Prior to that she obtained her B.Tech (Bachelor of
Technology) degree in Electronics and Electrical
Communications Engineering from the Indian Institute of
Technology, Kharagpur in 2002.
Bhaskar Saha received his Ph.D. from the School of
Electrical and Computer Engineering at Georgia Institute of
Technology, Atlanta, GA, USA in 2008. He received his
M.S. also from the same school and his B. Tech. (Bachelor
of Technology) degree from the Department of Electrical
Engineering, Indian Institute of Technology, Kharagpur,
India. Before joining PARC in 2011 he was a Research
Scientist with Mission Critical Technologies at the
Prognostics Center of Excellence, NASA Ames Research
Center, where his research focused on applying various
classification, regression and state estimation techniques for
predicting remaining useful life of systems and their
components, as well as developing hardware-in-the-loop
testbeds and prognostic metrics to evaluate their
performance. He has been an IEEE member since 2008 and
has published several papers on these topics
Kai Goebel received the degree of Diplom-Ingenieur from
the Technische Universitt Mnchen, Germany in 1990. He
received the M.S. and Ph.D. from the University of
California at Berkeley in 1993 and 1996, respectively. Dr.
Goebel is a senior scientist at NASA Ames Research Center
where he leads the Diagnostics and Prognostics groups in
the Intelligent Systems division. In addition, he directs the
Prognostics Center of Excellence and he is the technical
lead for Prognostics and Decision Making of NASAs
System-wide Safety and Assurance Technologies Program.
He worked at General Electrics Corporate Research Center
in Niskayuna, NY from 1997 to 2006 as a senior research
scientist. He has carried out applied research in the areas of
artificial intelligence, soft computing, and information
fusion. His research interest lies in advancing these
techniques for real time monitoring, diagnostics, and
prognostics. He holds 15 patents and has published more
than 200 papers in the area of systems health management.
First European Conference of the Prognostics and Health Management Society, 2012
78
European Conference of Prognostics and Health Management Society 2012
11
APPENDIX
Table 2. Results for validation battery 61, cycle 4.
tP RUL*
Particle
Filters
ANN
Regression
Polynomial
Regression
RUL Error RUL Error RUL Error
20 2793 2763 -30 2297 -496 2615 -178
247 2566 2510 -56 2698 132 2714 148
475 2338 2286 -52 1946 -392 2736 398
703 2110 2008 -102 990 -1120 861 -1249
930 1883 1796 -87 944 -939 372 -1511
1157 1656 1553 -103 481 -1175 572 -1084
1385 1428 1325 -103 855 -573 871 -557
1612 1201 1114 -87 857 -344 813 -388
1840 973 887 -86 641 -332 531 -442
2068 745 660 -85 877 132 893 148
2296 517 432 -85 649 132 665 148
Table 3. Results for validation battery 62, cycle 2.
tP RUL*
Particle
Filters
ANN
Regression
Polynomial
Regression
RUL Error RUL Error RUL Error
20 2597 1897 -700 2876 279 2870 273
247 2370 2313 -57 1463 -907 1663 -707
475 2142 2188 46 938 -1204 1465 -677
703 1914 1981 67 777 -1137 1058 -856
930 1687 1708 21 812 -875 845 -842
1157 1460 1468 8 1066 -394 1060 -400
1385 1232 1321 89 838 -394 1505 273
1612 1005 1094 89 1003 -2 1278 273
1840 777 909 132 1046 269 1050 273
2068 549 732 183 686 137 817 268
2296 321 494 173 600 279 594 273
Table 4. Results for validation battery 62, cycle 4.
tP RUL*
Particle
Filters
ANN
Regression
Polynomial
Regression
RUL Error RUL Error RUL Error
20 2519 2386 -133 1897 -622 2546 27
247 2292 2358 66 1405 -887 1740 -552
475 2064 2168 104 1560 -504 1697 -367
703 1836 1582 -254 1843 7 1863 27
930 1609 1580 -29 1616 7 1636 27
1157 1382 1473 91 1115 -267 1165 -217
1385 1154 1269 115 881 -273 676 -478
1612 927 1030 103 934 7 954 27
1840 699 616 -83 706 7 726 27
2068 471 598 127 478 7 498 27
2296 243 357 114 0 -243 0 -243
First European Conference of the Prognostics and Health Management Society, 2012
79
Diagnostics Driven PHM
The Balanced Solution
Jim Lauffer
DSI International, Inc.
Orange, California 92867, USA
ABSTRACT
Much effort has been made to develop technologies and
define metrics for Prognostics Health Management
(PHM). The problem is that most of this effort has focused
on theoretical and high risk concepts of Prognostics
performance while ignoring the real needs in “System
Health Management”. In the wake of this technological
attention, the importance of true Integrated Systems Health
Management (ISHM) has been masked by the focus on
single failure mode physics of failure solutions. The critical
PHM metrics, derived from Integrated Systems Diagnostics
Design (ISDD) have mostly been ignored. These critical
metrics include Reliability, Safety, Testability, and System
Maintainability & Sustainment, as well as the impact of
prognostics performance on Systems Diagnostics. A key
point to be made is that the ISDD process is much larger
than just developing metrics. ISDD results in a well-
designed system that meets true health management needs,
as well as significantly lowering development costs, and the
cost of ownership. Another point that needs to be made is
that the core of ISDD is a proven and highly effective
analysis solution in Model Based Diagnostics. This paper
discusses the approach of using Model Based Diagnostics in
the ISDD process to determine the best balance of the
Health Management design. It will be shown how the
impact and effectiveness of prognostics as integrated with
the ISDD process provides true value to performance and
cost avoidance.
1. THE SKEWED PHM TECHNOLOGIES
New York University mathematics physicist, Alan Sokal,
submitted an article on current physics and mathematics
based around quantum mechanics / chaos theory (Sokal,
1994&1995). Sokal’s article was republished by top
scientist in 1996 citing Sokal’s article as a credit to
scientific research. Soon after Sokal explained in a new
article that his publication had been salted with nonsense,
and in his opinion was accepted because: (a) it sounded
good and (b) it flattered the editor’s ideological
preconceptions.
It turns out that Sokal’s Hoax served a public purpose to
attract attention to what Sokal saw as a decline of standards
of rigor in the academic community. Today, this
philosophy of Sokal’s Hoax can easily be applied to
Government, Industry and Academia on the subject of
Prognostics Health Management. Far too many
technologists and business managers fall into the “hoax”
that systems can be prognosed to predict, within a known
Remaining Useful Life (RUL) parametric, for any and all
failures; then to go on to promote this RUL prediction of
precluding all failures that will prevent system operational
failures and enhance sustainment.
Back about thirty years ago this author had his first
experience with prognostics based on signature analysis. As
a result of this operation, the U.S. Navy looked into using
signature analyses from the ship’s noise propagation to
predict a signature shift that could possibly be leading to a
failure. After much trial and error on this concept, and after
unnecessary consumption of spare parts and maintenance
labor hours, it was decided that this form of “prognostics”
was not working.
A major U.S. project was based on PHM being the core to
the prevention of catastrophic failures and system aborts
through prognostics. This PHM system would also provide
the operational data needed to drive an Advanced Logistics
Information System (Gill, 2003). After investing untold
millions in U.S. dollars, it is recognized that the planned
PHM path must be modified. The realization that you
cannot prognose an entire system is finally coming into
focus. The idea of using the proven technology of Model
Based Diagnostics was discarded early in the program due
to the same philosophy Sokal exposed. By the way, the
_____________________
Lauffer et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which
permits unrestricted use, distribution, and reproduction in any medium,
provided the original author and source are credited.
First European Conference of the Prognostics and Health Management Society, 2012
80
European Conference of Prognostics and Health Management Society 2012
2
original definition of PHM was Diagnostics and Prognostics
Health Management. It did not take long for diagnostics to
be displaced by the more “exciting” prognostics exclusivity.
If you have read this far, you are probably thinking that this
author is anti-technology development and is only trying to
promote an old method for determining the Health
Management of a system. This is far from the intent of this
paper! This author has been a proponent of advanced
technology development from the 50s during the early
transition from vacuum tubes (valves as some say) to solid
state technology, and on to today’s prognostic technologies.
He has attended courses at Georgia Tech and has worked
with prognostics professionals. From all of this, it has to be
said that prognostics plays a very powerful role in PHM and
is the way of the future. With that said, it is also apparent
that prognostics is not a “Systems” health management
technology. It is limited to selected failure modes that must
not be allowed to fail due to system criticality. In depth
physics of failure analysis, proper sensors, and precise
processing needs to be in place to determine RUL when a
single failure mode approaches a critical state. Keep in mind
that the focus is on a single failure mode, and even into the
molecular structure of this single mode. Even then, this
processed single failure mode effect must be observable at
the system level to be considered a functional component of
PHM.
This is in contrast to a typical operational platform with tens
of thousands, or even hundreds of thousands of failure
modes. It is obvious that the prognostics technologies are far
from capable of performing system level PHM. There have
been attempts in AI, Bayesian Networks, Boolean Logic,
and others to perform this PHM System analysis. But, these
have been shown to be ineffective, at a very high cost and
risk.
As stated, there is a need for integrated prognostics that can
map a prognosed event to overall System PHM. Investment
into prognostics must be accountable, not just bought in to
satisfy study funding. Thousands of pages have been written
on prognostics but those studies have had a difficult, if not
impossible, time performing in a fielded system, let alone
contribute any value to design influence.
Over the last decade or so, the demand for increased
prognostics within complex, critical systems has resulted
not only in changes to how these systems are developed, but
also to the way in which designs are analyzed as they are
developed. In particular, system analysis practices have
been moving away from true System Health Management
values, such as reliability, testability, maintainability,
sustainment and the critical parametric today - Cost. Some
critical systems have focused on prognostics details while,
to the most extent, ignored the ISDD process. System
designs now either pursue high cost and risk custom
solutions to focus on prognostics, incorporate prognostic
details into other calculations, or ignore prognostics
altogether. This issue is amplified by the fact that much of
the value in reliability and testability analysis can best be
realized when design feedback is available relatively early
in the development cycle. On the other hand, prognostic
development and the evaluation of prognostic performance
take years of operational time to obtain any metrics of value.
It is unlikely that information derived from formal
prognostic performance metrics (Saxena, et al, 2010) can be
incorporated into systems engineering analyses to profitably
impact system development and decision-making. At the
same time, un-validated prognostics can lead to low
Availability and high sustainment cost due to false
removals. This results in notoriously time-consuming and
costly prognostic performance
As an alternative, some projects have implemented custom
solutions, modifying design-time engineering analyses to
account for the expected impact of prognostics concurrently
under development. There is, however, no standardized or
officially sanctioned approach to accounting for prognostics
performance. For each project, systems analysts must ask a
series of questions; for example, diagnostic analysts must
decide whether fault detection & isolation metrics should
take full or partial credit for prognosed failures, or whether
testability analysis can be constrained to cover only the non-
prognosed portion of the design. In either case, should
prognostic horizon and/or accuracy be taken into
consideration?
If so, then how is the end user or maintainer expected to
respond to prognostic notifications without questioning
them? Will there be cases in which some sort of
confirmation will be required before a maintenance action is
performed? Then the key question is, should diagnostic
analysis be consulted when determining the optimal areas in
which to develop prognostic measurements or will only
criticality considerations be involved in the selection of
prognostic candidates?
The root of these, and other related questions is the lack of
realistic and cost effective requirements, and the lack of
systems diagnostics understanding. So, what is the solution
to effective and affordable PHM? The answer is obvious –
Model Based Diagnostics. This Model Based Diagnostics is
a proven technology that has been in use for decades. In the
past 20 plus years it has come to be recognized as the
systems engineering tool of choice throughout industry.
Without going off track on the balancing of prognostics and
diagnostics in PHM development, it needs to be mentioned
that there is once again a push for something new in the
field of diagnostics analysis. There has been talk of Model
Based Diagnostics falling out of “fashion” within the same
community that has proliferated prognostics. Also, Model
Based Diagnostics has received some bad press from entry
level tools whose use has been attempted on projects where
the tools failed to perform. Unfortunately, these unproven
tools resulted in high costs with no acceptable results.
First European Conference of the Prognostics and Health Management Society, 2012
81
European Conference of Prognostics and Health Management Society 2012
3
These failures to perform lead the technology community to
downplay the use of Model Based Diagnostics and they
became vulnerable to high cost and high risk solutions.
They are told Model Based Diagnostics is considered
obsolete due to construed higher order mathematical
solutions. The issue is these “non-model based” solutions
have significant problems with development skill needs,
high cost, lack of system integration, and are limited to
small scale analyses. Just as prognostics entered as the
“new and improved” health management solution, other
analytical solutions are continuing to be pushed into the new
wave of thinking without the understanding of a systems
engineering approach.
One such solution attempt has been tried over the years in
several research communities and this is based on Bayesian
Networks. As with prognostics, a Bayesian Network
requires extensive development and cannot begin until the
design is well defined. Then, if there is a design change, the
analysis needs to start all over again. Even if a network can
be completed, it is limited to smaller systems, cannot
provide knowledge to the Logistics sustainment solution,
and still requires years of learning to “fine tune” the results.
2. DIAGNOSTICS DRIVEN PHM
Now that this author has ripped “stand alone” attempts at
prognostic solutions, the following discussion focuses on
effective diagnostics driven PHM based on Model Based
Diagnostics. This ISDD process is centered on a proven tool
suite and process that brings the system design into an
optimized PHM solution. This solution provides the
confidence needed for fault detection and isolation at the
system level that includes the impact of prognostics on
diagnostics. This ISDD process identifies the candidates
needed for an effective prognostics analysis. It also provides
the parametrics used for and Operational and Support
Simulation. This simulation capability is shown in section
5.
For the system design to be optimized for effective health
management and sustainment, the diagnostics design
process needs to begin early in the design process. This is
something prognostics cannot do. The diagnostics analysis
results in a selection of candidates for prognostics analysis.
See Figure 1 for this diagnostics informed prognostics
analysis process.
As emphasized, for optimum results in design influence, the
ISDD process needs to begin at the start of the project’s
design phase. This is where PHM and sustainment must be
considered to be effective and affordable. Along with
testability requirements (the probability of fault detection,
isolation to a defined ambiguity set, and false alarm
constraints), PHM and logistics requirements must be
understood.
Figure 1. Diagnostics selection of prognostics candidates
With this in mind, and to keep this paper on track, the
following discussion focuses on system prognostics
requirements as driven by the ISDD process.
Figure 1 shows these prognostic requirements being defined
at the beginning of the diagnostics engineering
development. This is a critical point in PHM design and is
where the customer typically falls short in requirements
definitions. Very few customer project managers
understand prognostics well enough to flow down cohesive
prognostic requirements. To be effective, these initial
prognostics requirements now need to be included in the
diagnostics test definitions in the form of prognostic tests.
These prognostic parametrics are defined in section 3.1.
As the diagnostics analysis is developed, prognostic
candidates are developed as part of the optimized
diagnostics results. The prognostics candidates are
prioritized based on the failure mode severity and the failure
rate. The primary candidates are those failure modes that
cannot be allowed to progress to failure, and failure
mitigation through functional redundancy is not practical or
possible. An example of a prognostics candidate list derived
from the diagnostics analysis is shown in Figure 2. This
example is not intended to be an eye test but is used to show
the format of a typical candidate list. Note that in the
example, two Loss of Life severities are listed below the
Loss of Equipment candidates. This is not to suggest Loss
of Life is less important, it is just listed based on the lower
failure rates. In an actual prognostics assessment, these two
candidates would certainly be considered important. But, at
the same time, if their Loss of Life failure probability is
very low, the prognostics for this failure mode may not be
cost effective.
First European Conference of the Prognostics and Health Management Society, 2012
82
European Conference of Prognostics and Health Management Society 2012
4
Figure 2. Prognostic Candidate List from Diagnostics
Continuing in the diagnostics engineering process, the
diagnostics analysis results, along with selected prognostics
tests, are fed into the product design to support the PHM
design solution. This provides the all-important design trade
study process that builds a well-balanced, diagnostics
driven, PHM solution. Later in the process it is shown
where prognostics parameters may be available to further
optimize the diagnostics analysis.
3. SYSTEM PROGNOSTIC REQUIREMENTS
The following discussion focuses on the approach to
incorporating prognostic considerations into areas such as
reliability, testability, maintainability and sustainment
analyses. This is accomplished by representing expected
prognostic behavior in terms derived from system
prognostic requirements. This will show how these
parameters can be used to define prognostic behavior within
a diagnostic engineering process. Finally, this will show
how these prognostic definitions can be used to modify the
results of standard measures of diagnostic effectiveness
using fault detection and isolation metrics defined within
IEEE Standard 1522-2004. This also looks into informed
simulation-based approaches for assessing the impact of
different prognostic, diagnostic and maintenance strategies.
The following definition of requirements, parameters and
example are based on a paper by Eric Gould who has
developed advanced prognostic influence capabilities in the
DSI eXpress Diagnostics Engineering tool (Gould, 2011).
This previous paper is being paraphrased in some sections to
provide specific information needed to understand how
prognostics is used in the ISDD process.
Even though the academic technology of system prognostics
has been around with study support since the 1990s, the
understanding of prognostics requirements are relatively
new to design development projects. This is compared to
system diagnostic and testability requirements which have
been around since the 1980s. It is therefore not surprising
that there has been a fair amount of variance in the
definitions of desired prognostic capabilities from one
project to another.
For effective prognostics requirements to be defined, a
process for the derivation of these requirements must be
understood and followed. Aspects covered by these
qualitative descriptions include 1) whether the prognostics
shall be embedded in the system, 2) whether prognostics
shall be automated or initiated, 3) whether prognostics shall
be developed solely for the determination of mission-
readiness or also for the optimization of Logistics, 4)
whether prognostics results shall be reported to the crew,
maintenance technicians, and/or mission planners, and 5)
whether prognostics shall consist solely of condition-based
observations of failure precursors or whether it can also
contain predictions based on the failure rates and stress
histories of individual components. Although information of
this type is essential for describing the prognostic capability
required for each project, it is not relevant to the following
discussion. In the example shown in section 3.2, the
requirements have been trimmed down to include only the
information needed for a quantitative evaluation of a
system’s prognostics capability, and the impact of
prognostics on systems diagnostics in PHM.
3.1. Prognostic Parameters
With the quantitative aspects of the requirement broken
down into individual parameters, it was determined that five
basic parameters were sufficient for describing any of the
sample requirement statements:
Scope – the set of possible failures to which a given
requirement applies. Common scopes include mission
critical failures, essential function failures, or failures that
necessitate a system abort.
Category – the set of prognoses to which a given
requirement applies, such as embedded or sensor-based
prognoses.
Horizon – the time before failure that prognosis must occur.
This can either be a fixed value (e.g., 72 hours prior to
failure) or a calculated value, based on both the desired
mission length and the corrective action time associated
with each failure.
Coverage – the percentage of failures in the specified scope
that must be prognosed. This parameter can either be failure
probability-weighted (so that there is greater credit for
failures that occur more frequently) or non-weighted (so that
all failures in the specified scope are counted equally).
Accuracy – the desired confidence/correctness of the overall
prognostic capability (typically defined as a percentage of
accuracy). In some requirement statements, Accuracy is
bundled with Coverage as a single percentage of failures
prognosed.
3.2. An Example of Prognostic Requirements
The following example examines the individual prognostic
requirements, parsing each statement into the related
parameters and discussing any interpretive peculiarities. All
threshold/objective parameters have been simplified so that
they are expressed as a single goal.
First European Conference of the Prognostics and Health Management Society, 2012
83
European Conference of Prognostics and Health Management Society 2012
5
1) Requirement Example
Prognostics shall predict at least 80% of the mission critical
failures 96 hours in advance of occurrence with 90%
probability.
Scope: Mission Critical Failures
Horizon: 96 hours
Coverage: 80%
Accuracy: 90%
This prognostic requirements statement has four parameters
that collectively specify the expected behavior of the
prognostics. Because it reads like a performance
requirement — one that specifies the expected performance
of a fielded system, greater credit should be given to
prognosed failures that occur more frequently than to those
that occur relatively infrequently. So, when calculated as an
engineering metric, the prognostic coverage should be
weighted by the failure probability of each individual
failure. The overall coverage can thus be calculated by
summing the failure rates of the failures in the scope that
can be prognosed, divided by the sum of the failure rates for
all failures in the scope.
4. PROGNOSTIC DEFINITIONS
Now let’s take a look at how prognostic definitions can be
defined within a proven diagnostic engineering tool. There
are several reasons why support for prognostics should be
added to tools that are used primarily for the creation,
assessment and optimization of system diagnostics. First of
all, if the tool has been designed for system-level diagnostic
analysis, then it already has the infrastructure in place to
perform an analysis of system-level prognostic performance.
Data from individual prognostic definitions is compiled
across the entire system to produce overall measures of
prognostic effectiveness— measures that can be easily
compared to system prognostic requirements to determine
contract compliance.
A second (and perhaps more significant) advantage to
representing prognostic measurements within a Diagnostic
Engineering tool is that the Reliability, Testability, and
Maintainability evaluations performed within the tool will
be able to reflect the expected performance of systems for
which mission readiness is assured using prognostics.
Moreover, diagnostic procedures developed within the tool
can be optimized based on the assumption that prognostics
will be employed based on real needs.
For example, prior to developing prognostic sensor and
algorithm requirements, an analysis of the system can be
used to determine the set of failures for which prognosis is
most desirable. This takes into consideration not only the
criticality and frequency of failures, but also how
successfully the system can diagnose and remediate the
failures without prognostics. Later, if the bottom line
changes and you need to reconsider the value of developing
some of the more expensive prognostic sensing and
algorithms, you can easily reevaluate the PHM performance
that would be achieved if the system were to not have this
capability.
A third advantage of adding prognostic definitions to a
diagnostic engineering tool’s model or database is that this
information can be easily exported for analysis within an
external tool. For example, simulation-based case studies
can be performed using different health management
approaches. This will allow PHM analysts to evaluate
different combinations of diagnostics, prognostics and
preventative maintenance to determine which combinations
are most effective, not only from the perspectives of
availability or mission readiness, but also sustainment and
cost effectiveness. Section 5 describes some of this
simulation capability.
4.1. Tests and Prognoses
In proven and accepted model-based diagnostic engineering
tools, test definitions are used to represent diagnostic
knowledge. To be effective, each individual test definition
must specify the coverage of a corresponding fielded
operational test or measurement. This coverage identifies
the specific functions or failure modes that should be
exonerated (removed from suspicion) or indicted (called
into suspicion) when that test passes or fails. Tests are
organized into different test sets so that they can be easily
selected as groups to support different diagnostic case
studies. Examples given relate to eXpress, DSI
International’s Diagnostics Modeling and Analysis tool, and
to DSI’s STAGE Operations and Support Simulation tool.
Prognostic measurements can be represented using a special
type of test definition. This is basically a test definition to
which prognostic parameters have been attached. The
coverage for each prognosis is represented the same way as
it would be for a diagnostic test; the only difference being
that the coverage now represents the specific functions or
failure modes for which failures can be predicted using
prognostics. As with diagnostic tests, prognostic
measurements can also be organized into test sets. When a
project has prognostic requirements that utilize the Category
parameter, the individual measurements should be grouped
into sets by category. The analysis can then be constrained
by simply selecting the sets that correspond to the desired
prognostic categories.
4.2. Prognostic Terms
For each prognostic definition, the analyst must specify one
or more Horizons, each accompanied by three variables—
Confidence, Correctness and Accuracy—that collectively
describe the expected behavior of the given prognostic
measurement at that Horizon. See Figure 3 for an example
of Prognostic Settings in eXpress with single horizon.
First European Conference of the Prognostics and Health Management Society, 2012
84
European Conference of Prognostics and Health Management Society 2012
6
Figure 3. Prognostic Settings in eXpress with single horizon
The value of the specified Horizon is similar to the Horizon
parameter within a prognostic requirement; it represents a
time interval before failure that the given prognosis might
occur. The Confidence represents the likelihood that the
given prognosis will predict the covered failure(s) at or
before the specified Horizon. It is expected that Confidence
increases as the Horizon decreases; in other words, that
predictions become more confident as a prediction
approaches the time of failure.
The Correctness variable is used to represent the expected
percentage of prognoses that are correct; that is, not too
early. By default, the Correctness setting affects neither the
prognostic nor diagnostic analysis performed using that
measurement. The Correctness value, however, can still be
used to categorize prognoses within a simulation-based
assessment of a proposed PHM approach. Note that
excessively early prognoses leads to false aborts and wasted
maintenance cost and time.
The calculated Accuracy value corresponds to the Accuracy
parameter within a prognostic requirement. Unlike the other
two values used to describe a given Horizon, Confidence
and Correctness, the Accuracy variable is not defined by the
analyst, but rather calculated automatically by the analysis.
A prognostic condition that must be addressed is the need
for corrective action to be performed only for prognoses
verified to be correct. This is the case when a given
prognosis is not only independently verifiable, but will be
verified before corrective action is performed. As an
example, think of the brake pads on an automobile. As the
pads wear past a given point, they begin to squeal when the
breaks are applied. This is an intentional design
characteristic that allows the owner of the car to identify
when the pads need to be replaced. This relates to the
squealing of the brake pad as a condition-based prognosis of
a pending failure. Now, imagine that, when your brakes start
to squeal you inspect the pads and see that there is plenty of
life left—the squeal came too early. Would you still replace
the pads?
Figure 4. Accuracy calculated using both Confidence and
Correctness
From a purely realistic standpoint, the Accuracy of your
prognosis would be equal to your Confidence that prognosis
would occur prior to failure. If, however, if you only replace
the pads when they have truly worn down (when the
prognosis was correct) then the accuracy of your prognosis
must be adjusted down to account the possibility of these
false squeals.
So, when this prognostic condition is selected in the
analysis, the calculated Accuracy is equal to the product of
the Confidence and Correctness percentages. See Figure 4
for an example of Accuracy calculations. Accuracy then
represents the likelihood that the prognosis occurs early
enough (Confidence), but not too early (Correctness).
Of course, the real value of incorporating prognostics into a
diagnostic engineering model is not so much to facilitate the
prognostic analysis itself as it is to develop, assess and
optimize systems diagnostics capability This is based on the
assumption that a given level of prognosis can and will be
achieved.
5. SIMULATION OF PROGNOSTIC IMPACT ON DIAGNOSTICS
Figure 5 shows a section of an automotive braking system
that has been modeled for diagnostics analysis. The pink
highlighted items (1, Brake Pads, 2, Tires) are identified for
prognostic testing. The diagnostic results from this analysis
were exported in an XML schema (DiagML) to be used for
PHM software development and for use in other tools. One
of these tools, DSI STAGE, takes the analysis results and
performs a Monte Carlo simulation using developed
calculations for specific simulation results. Some of these
results are presented below to show the diagnostics behavior
for systems that are to be supported using selected
prognostics derived from the analysis. Note that the
simulation graphs shown are for representation of example
analyses. Do to scaling of the graphs, the scale legends are
not legible and are for reference only. The typical
simulation time is 4000 hours of brake operational use.
First European Conference of the Prognostics and Health Management Society, 2012
85
European Conference of Prognostics and Health Management Society 2012
7
Figure 5. Section of eXpress diagnostics model showing
targeted prognostic candidates
This capability of analyzing prognostic performance as part
of diagnostics in PHM provides PHM optimization based on
both requirements and constraints. Through simulation, the
overall PHM solution is evaluated based on how well PHM
meets system requirements and how well it can be
implemented within cost constraints. Some typical
simulation calculations include: Prognostic Effectiveness,
Fault Detection and Isolation, Diagnostic False Alarms,
Critical Failures, System Aborts, Mission Success, Mean
Time Between Failure, Mean Time to Repair, System
Availability, Development Costs, Sustainment Costs, and
Total Cost of Ownership, plus many more to meet analysis
needs.
The following charts are from simulation runs based on the
diagnostics results from the model shown in Figure 4. The
simulation was run with 500 iterations of an operational
time of 4000 hours. The simulation was randomly seeded.
The calculations used where: Likelihood of Critical Failures
Over Time (progressive), Critical Failures Prognosed Over
Time (number), System Aborts Over Time (number),
Critical Failure Prognosed per Failure Entity (number),
Mean Time Between Prognostics/Maintenance Actions
Over Time, and Faults (Despite Prognostics) Over Time.
The use of effective prognostics developed condition based
maintenance can reduce the likelihood of critical failures.
As seen in Figure 6, the critical failure events on this
analysis begin with loss of operation (1–yellow) and loss of
equipment (2-Orange), beginning with low probability in
the systems operational life cycle. Then, as the system ages,
the probability of Loss of operation increases rapidly,
followed by loss of equipment. Finally, the loss of life
severity (3-red) begins to increase further into the
operational life cycle. This is where the assessment of
prognostics needs to be performed for those failure modes
contributing to these critical failures.
Figure 6. Simulation results showing likelihood of Critical
failures over time
Where the simulation shown in Figure 6 shows total failures
progressively over time by severity, Figure 7 shows the
number of critical failures over time, and also shows those
failures detected by prognoses (4, magenta). These
prognosed failures are calculated as being repaired prior to
critical failure. Since it well known that prognostics is not
100% correct, other critical failures did occur. These
failures are shown by number of failures at a specific point
in time. These failures are also identified by severity (1,
yellow, loss of operation; 2, orange, loss of equipment; 3,
red, loss of life). With the ability to observe types of failures
over time, it is now possible to re-analyze the diagnostics,
and possibly improve the prognostics effectiveness.
Figure 7. Number of critical failures prognosed over time
First European Conference of the Prognostics and Health Management Society, 2012
86
European Conference of Prognostics and Health Management Society 2012
8
Figure 8. System aborts contributed to inadequate PHM
Figure 8 shows the simulation results for system aborts over
Time. This calculation is based on the accuracy of
prognostic tests defined in the diagnostics analysis. The
“true” system aborts are projected over time as shown at the
bottom of the graph (2-orange).
The system aborts contributed to false prognostics are
shown in the top of the graph (1-red). This is a design
condition that can be corrected by improving the prognostic
tests and therefore the accuracy of these tests. Once the
prognostics have been assessed for improvement, the
diagnostics analysis would be adjusted based on new test
parameters. The simulation would then be re-run to validate
the results for improvement in system aborts. During this
diagnostics update, the “true” system aborts can be assessed.
Even though the number of true aborts is low, there may be
opportunities for improvement.
Figure 9 shows the number of critical failures over time by
failure entity (failed item) and by failure severity. Note that
the diagnostic analysis performed on the sample automotive
braking system is used for demonstration only and does not
necessarily represent actual operational parameters for this
system. This statement is made to keep people from arguing
about the actual diagnostic values rather than paying
attention to the message being presented!
The failures shown are the same as those contributing to the
simulation results in the other charts, except these are
identified by specific parts. Those item failures prognosed
are identified in magenta (2). The groups of four are the
brake pads (four right and four left, front and rear). The
prognostic test for these is quite basic. Each pad contains a
low pad thickness metallic “scraper” or “squealer”. When
the brakes squeal, it is time for inspection. The added
failures shown for the brake pads (1, orange, loss of
equipment), are based on actual pad failure to where they
are scraping the disk rotor (very expensive repair). These
“running to failure” events can be minimized through better
prognostics.
Figure 9. Number of critical failures by item and severity
In fact, existing, more sophisticated, brake pad wear
detection does exist in the form of optical sensing.
There is a loss of life (3, red) failure that involves the
Antilock Brake System hydraulic pump. There is a possible
loss of braking control if this pump fails. It does have a low
probability of failure, but this would be a candidate for
additional prognostics.
The lager loss of operation, or degraded operation (4,
yellow), failures noted are for air in the hydraulic lines.
There are two items shown with higher failure rate for loss
of equipment (orange). These are the rear brake lights. The
burnt out bulb failure mode would be difficult to prognose
but some automobiles do have detection sensors that
provide a warning on the dashboard. The use of LEDs
significantly reduces the failure rate for these items. But,
again, this shows the value of running a simulation of
diagnostics results to provide a graphical representation of
diagnostics, prognostics and maintenance actions over a
specified operational time. The simulation results are not
limited to charts. Each calculation result has a detailed
report defining events and values.
Figure 10 shows calculation results for frequent failures that
are prognosed but without a maintenance plan. These
prognosed items are repaired without opportunistic
maintenance or an effective level of repair definition. Since
actual physics of failure prognostics typically looks at
molecular level single failure modes, the analysis considers
only single failure modes with no repair concept. Reliable
items that were not repaired as a balanced maintenance
action will begin to fail as the operating system matures,
resulting in increased prognostic related failures and low
Mean Time Between Maintenance Action (2, green) and
Mean Time Between Prognostic events (1, magenta)
First European Conference of the Prognostics and Health Management Society, 2012
87
European Conference of Prognostics and Health Management Society 2012
9
Figure 10. Mean time between a prognostics maintenance
action over time
If this were calculated for Mission Success and Availability,
It would show a direct correlation to reduced performance
from the lack of maintenance understanding in a prognostics
analysis. This is mitigated through the integration of
prognostics and diagnostics in an effective ISDD process.
Figure 11 shows the calculation results of failure modes that
are prognosed but the failure was not detected prior to
failure. The loss of equipment failure severity (2, orange) is
shown for those failure modes that need to be reassessed for
possible prognostics improvement. The grey areas indicate
no failure effect (1).
Figure 11. Faults over time by severity despite prognostics
6.0. CONCLUSIONS
There are currently no real guidelines for the calculation of
diagnostic-related metrics for systems whose critical failures
are covered by prognostics. More important is the lack of
prognostics selection based on intelligent diagnostics
analysis. Not only have approaches not yet been
standardized, but many of the alternatives may not have
even been discussed in the public arena. Existing standards
describing diagnostic analysis, such as the IEEE Testability
standard (IEEE Std. 1552, 2004), do not yet account for
prognostics in any way. As a result, diagnostic engineering
analysis and simulation tools have been enhanced to address
this issue.
As more systems are planned for embedded prognostics,
questions about the relationship between prognostics and
diagnostics, and even beyond into sustainment, are likely to
become even more prominent. A common practice will
begin to emerge with subsequent efforts at standardization.
It is important that the relationship between prognostic and
diagnostic analysis be worked as an integrated solution.
Based on subjective, experience driven research, previous
methods for assessing diagnostic-related prognostics
behavior remain in question, and suppliers, customers and
the companies that supply their tools also remain in
question.
The main point of all of this is to break out of the “Sokal
Hoax” syndrome and work the technologies with the goal of
a balanced Health Management and Sustainment solution.
The end result will be significantly lower development,
operation, and support costs, while experiencing higher
Mission Success and Operational Availability!
ACKNOWLEDGEMENT
Eric Gould, Senior Scientist, DSI International, needs to be
recognized for his development of the integration of
Diagnostics and Prognostics in the model Based Diagnostics
process.
REFERENCES
IEEE Trial-Use Standard for Testability and Diagnosability
Characteristics and Metrics, IEEE Std 1522-2004.
Gill, Luke, 2003, F-35 Joint Strike Fighter Autonomic
Logistics Supply Chain
Gould, E., 2011, Diagnostics “After” Prognostics
Saxena, A., Celaya, J., Saha, S., and Goebel, 2010, K.,
Metrics for Offline Evaluation of Prognostic Performance,
International Journal of Prognostics and Health
Management, ISSN 2153-2648, 1010 001.
Sokal, Alan D., 1994, 1995, Transgressing the Boundries:
Towards a Transformative Hermeneutics of Quantum
Gravity
First European Conference of the Prognostics and Health Management Society, 2012
88
European Conference of Prognostics and Health Management Society 2012
10
BIOGRAPHY
James R. Lauffer (Jim) began the technology journey in
Ohio, USA, on New Year’s Day, 1941. Jim received his
Ham Radio license at the age of 12 and designed many
antennas and radio systems. He entered the Air Force at 17
and worked in the Strategic Air
Command as a maintainer of B47s and
anything else that landed on the base.
He entered industry in 1962 and spent
the next 40 years in Logistics,
Reliability, Maintainability and finally
Systems Engineering. This career
began with North American Aviation
in 1962, then Rockwell and finally Boeing. Much of the
engineering time was in combat system development on
international programs, field operations testing, and then in
management trying to get all of these technologies to work
together. Much of the field testing work involved passive
sonar sea trails in the Atlantic and the Mediterranean. He
also worked aircraft avionics upgrades for the Royal
Australian Air Force. Jim retired from Boeing in 2001 and
agreed to help out a small engineering business called DSI.
This was 11 years ago and he still has not figured out how
to retire. But, this past eleven years have resulted in a wealth
of knowledge related to the diagnostics technologies and the
resulting health management and sustainment systems.
Jim’s formal education is a bit thin compared to others in
this field. He started in the world of Applied Physics but
due to work and family, ended up with a business degree. I
guess you would say he has several PhD degrees in
experience. Jim is a past member of the Society of Logistics
Engineers, Society of Reliability Engineers, and the
Association of Old Crows (Electronic Warfare), IEEE, and
presently The American Institute of Aeronautics and
Astronautics (AIAA). To this day Jim continues to study
and attend courses in the sciences. He is still active in Ham
Radio with his Extra Class license and is always looking for
new and balanced PHM solutions.
First European Conference of the Prognostics and Health Management Society, 2012
89
Fatigue Crack Growth Prognostics by Particle Filtering and Ensemble Neural Networks
Piero Baraldi1, Michele Compare1, Sergio Sauco1 and Enrico Zio1, 2
1Politecnico di Milano, Milano, Italy
[email protected] [email protected]
2Chair on Systems Science and the Energetic Challenge, European Foundation for New Energy-Electricité de France, Ecole Centrale Paris and Supelec, France
[email protected] [email protected]
ABSTRACT
Particle Filtering (PF) is a model-driven approach widely used in prognostics, which requires models of both the degradation process and the measurement acquisition system. In many practical cases, analytical models are not available, but a dataset containing a number of pairs component state - corresponding measurement may be available.
In this work, a data-driven approach based on a bagged ensemble of Artificial Neural Networks (ANNs) is adopted to build an empirical measurement model of a Particle Filter for the prediction of the Residual Useful Life (RUL) of a structure whose degradation process is described by a stochastic fatigue crack growth model of literature. The work focuses on the investigation of the capability of the proposed approach to cope with the uncertainty affecting the RUL prediction.
1. INTRODUCTION
The prediction of the Remaining Useful Life (RUL) of a degrading equipment is affected by several sources of uncertainty such as the randomness in the future degradation of the equipment, the inaccuracy of the prognostic model used to perform the prediction and the noise in the sensor data used by the prognostic model to obtain the RUL prediction. Thus, any RUL prediction provided by a prognostic model should be accompanied by an estimate of its uncertainty (Tang et al. 2009; Liu et al. 2011) in order to confidently plan maintenance actions, taking into account
the degree of mismatch between the RUL predicted by the prognostic model and the real RUL of the equipment (Coble 2010; Zio 2012). In this respect, a method able to estimate a probability density function of the degrading equipment RUL is PF, which is a model-based approach successfully used in prognostics applications (e.g., Vachtsevanos et al. 2006, Orchard et al. 2005, Orchard & Vachtsevanos 2009, Cadini et al. 2009). PF is a Bayesian tool for non-linear state estimation, which requires (e.g., Gustaffson & Saha 2010, Doucet et al. 2001, Arulampalam et al. 2002):
1) The knowledge of the degradation model describing the stochastic evolution in time of the equipment degradation x (in general a multi-
dimensional vector):
( 1) = ( ( ), ( ))x t g x t tω+ (1)
where g is a possibly non-linear vector function
and ( )tω is a possibly non-Gaussian noise.
2) A set of measures (1),..., ( )z z t of past and present
values of some physical quantities z related to the
equipment degradation x . Although z in general is
a multi-dimensional vector, in this work it is considered as a mono-dimensional variable; then, the underline notation is omitted.
3) A probabilistic measurement model which links the measure z with the equipment degradation x :
( ) = ( ( ), ( ( )))z t h x t x tν (2)
where h is a possibly non-linear vector function and ( )xν is the measurement noise vector.
_____________________ Baraldi et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
First European Conference of the Prognostics and Health Management Society, 2012
90
European Conference of Prognostics and Health Management Society 2012
2
In practical cases, the measurement model h may not be available in analytical form but a dataset
= ( , ), = 1,..., nn trainingT x z n N containing a number trainingN
of pairs of state nx and corresponding measurement nz
may be available. This is the case, for example, of the piping of deep water offshore well drilling plants, which degrades due to a process of scale deposition. This may cause a decrease, or even a plug, of the cross sections of the tubular. Giving the inaccessibility of the piping, it is usually impossible to acquire a direct, on line, measure of the scale deposition thickness. On the other side, research efforts are devoted to perform laboratory tests to investigate the relationships between the scale deposition thickness and other parameters which can be more easily measured during plant operation, such as pressures, temperatures and brine concentrations. By this way, one can populate a dataset with the values of the measurable parameters for different scale deposition thicknesses, and use the data to build data-driven models for predicting the scale deposition thickness (Moura et al. 2011).
In this work we have developed an ensemble of ANNs (e.g., Baraldi et al. 2012) as model of the measurement equation in a PF scheme. The proposed prognostic approach is applied to a literature case study (Orchard & Vachtsevanos 2009) concerning crack propagation. The obtained results are compared to those which would be obtained by direct using the measurement equation in the PF model, considering the accuracy of the RUL prediction and the capability of the method of providing an estimate of its uncertainty.
2. PARTICLE FILTERING
In PF, a set of sN weighted particles, which evolve
independently on each other according to the probabilistic degradation model of Eq. 1, is considered. The basic idea is that such set of weighted random samples constitutes a discrete approximation of the true probability density function (pdf) of the system state x at time t . When a new measurement is collected, it is used to adjust the predicted pdf through the modification of the weights of the particles in a Bayesian perspective. This requires the knowledge of the probabilistic law which links the state of the component to the gathered measure (Eq. 1). From this model, the probability distribution ( | )P z x of observing the sensors
output z given the true degradation state x is derived
(measurement distribution). This distribution is then used to update the weights of the particles upon a new measurement collection. Roughly speaking, the smaller the probability of encountering the acquired measurement value, when the actual component state is that of the particle, the larger the reduction of the particle's weight. On the contrary, a good match between the acquired measure and the particle state results in an increase of the particle importance (for further
details, see Arulampalam et al. 2002 and Doucet et al. 2001).
3. BAGGED ENSEMBLE OF ANNS FOR BUILDING THE MEASUREMENT MODEL
A method to estimate the pdf ( | )P z x of the measurement
z in correspondence of a give equipment degradation state x is proposed in this Section. It is derived from Carney et
al. (1999) and Nix &Weigend (1994), and requires the availability of a dataset made of trainingN couples ( , )n nx z .
The underlying hypothesis of this approach is that the measurement model, which is unknown, can be written in the form:
( ) ( ) ( )z x f x xν= + (3)
where ( )f x is a biunivocal mathematical function and the
measurement noise ( )xν is a zero mean Gaussian noise.
The method of Carney et al. (1999) is based on the use of a bagged ensemble of ANNs, which are employed to build an interpolator ( )xϕ of the available training patterns
( , ), 1.,...,n n trainingT x z n N= = .
The key idea of bagging (Breiman 1999) is to treat the available dataset T as if it were the entire population, and then create alternative versions of the training set, by randomly sampling from it with replacement. This allows providing more stable estimations. In details, a number B of alternative versions *
=1 Bb bT of T are created by randomly
sampling from it with replacement. Using these training sets, the networks *
=1 ( ; ) Bb b bx Tϕ are built and the output
( )avg xϕ of the bagged ensemble in correspondence of the
generic test state x is obtained by averaging the single
ANN output according to:
*
=1
1( ) = ( ; )
B
avg bb
x x TB
ϕ ϕ∑ (4)
On the other hand, since PF requires the knowledge of the pdf ( | )P z x , the estimate of ( )f x does not suffice to apply
PF. In this respect, the procedure proposed in Carney et al. (1999) allows to estimate the pdf ( | ( ))P z f x from which
the pdf ( | )P z x can be obtained, being the function f
invertible for hypothesis. The procedure is based on the subtraction of the random quantity ( )avg xϕ to both sides of
Eq. 3:
( ) ( ) = [ ( ) ( )] ( )avg avgz x x f x x xϕ ϕ ν− − + (5)
First European Conference of the Prognostics and Health Management Society, 2012
91
European Conference of Prognostics and Health Management Society 2012
3
The left-hand side of Eq. 5 is a random variable which represents the error of the ensemble output ( )avg xϕ with
respect to the measurement ( )z x .
This random error is made up of two contributions (right hand side of Eq. 5):
1. The random difference ( ) ( )avgf x xϕ− between the
unknown deterministic quantity ( )f x and the
ensemble output ( )avg xϕ . This quantity is a
random variable distributed according to ( ( ) | ( ))avgP f x xϕ , being ( )avg xϕ dependent on the
random training set ,bT b=1,…, B; i.e., different
training sets would lead to different ensemble models and thus to different output ( )avg xϕ . Since
( ) ( )avgf x xϕ− can be seen as the model ( )avg xϕ
error, its variance will be referred to as model error variance and indicated by 2 ( )m xσ .
2. The intrinsic noise ( )xν of the measurement
process, whose variance is indicated by 2 ( )xα .
These two contributions are estimated by means of the procedures described in the two following Sections.
3.1. Distribution of the model error variance
The procedure here used to estimate the distribution ( ( ) | ( ))avgP x f xϕ of the ensemble output ( )avg xϕ given the
true value of ( )f x (i.e., the ‘inverse’ of ( ( ) | ( ))avgP f x xϕ ),
is based on the assumption that the random variable ( ) ( )avgf x xϕ− is Gaussian with zero mean and standard
deviation ( )m xσ , which entails that ( ( ) | ( ))avgP x f xϕ is
Gaussian with mean ( )f x , and that all we need to know is
( )m xσ . Notice that residual errors in the output of the ANN
are usually not caused by variance alone; rather, there may be biases in the output of the ANN, which invalidate the assumption that the mean of the distribution is zero. However, it is generally accepted that the contribution of the variance in the residual error of the ANN dominates that of the bias (see Stuart et al 1992 for further details on this). Furthermore, the bias in the output of an ensemble of NNs is expected to be smaller than that of the single ANN.
In order to estimate the model error variance 2 ( )m xσ , the
technique in Carney et al. (1999) requires to divide the B networks of the ensemble ( )avg xϕ into M smaller sub-
ensembles, each one containing K networks, and to consider the output ( ),m
com xϕ m=1,.., M of each sub-
ensemble as:
=1
1( ) = ( )
Kmcom k
k
x xK
ϕ ϕ∑ (6)
The set =1= ( )m Mcom mxζ ϕ constitutes a sampling of M
values from the distribution ( ( ) | ( ))com avgP x xϕ ϕ and its
sample variance 2ˆ ( )m xσ could be used to approximate the
unknown variance 2( )m xσ of the ensemble output.
Notice that the idea behind this procedure is that by estimating ( )f x with ( )avg xϕ , one can approximate
( ( ) | ( ))avgP x f xϕ by ( ( ) | ( )).com avgP x xϕ ϕ In order to
improve the reliability and stability of 2ˆ ( )m xσ , bagging is
also performed on the values of ζ . Thus, P bagging re-
sampled sets of ζ are gathered:
*=1= P
p pζΓ (7)
where *pζ is the p -th subset containing M values of
( )com xϕ , sampled with replacement from ζ . For any subset *pζ , = 1,...,p P , the corresponding variance 2* ( )p xσ is
computed; then, the estimate 2ˆ ( )m xσ of the variance 2 ( )m xσ
is calculated as their average value:
2 2*
=1
1ˆ ( ) = ( )
P
m pp
x xP
σ σ∑ (8)
Finally, the estimate of the regression distribution ( ( ) | ( ))avgP x f xϕ proposed by the method is:
2ˆ( ( ) | ( )) ( ( ), ( ))avg avg mP x f x N x xϕ ϕ σ≈ (9)
3.2. Distribution of the measurement noise
In this Section, the technique proposed in Nix & Weigend,
(1994) is applied to estimate the variance 2( )xα of the
Gaussian zero mean noise ( )xν affecting the measurement
equation (Eq. 3).
From Eq. 5, one can derive:
2 2
[ ( )]
[ ( ) ( )] [ ( )] 2 [ ( ) ( )] ( )
( ) ( )
avg
avg avg
m
Var z x
Var f x x Var x E f x x x
x x
ϕϕ ν ϕ ν
σ α
− =
− + + − =
+(10)
The last equality is due to the independence of the error [ ( ) ( )]avgf x xϕ− from the measurement noise ( )xν . To
explain this, notice that [ ( ) ( )]avgf x xϕ− depends on the
noise values nν affecting the measures ( ) ,n n nz f x ν= +
1.,..., ,trainingn N= in the training data
( , ), 1.,...,n n trainingT x z n N= = , which are used to build the
First European Conference of the Prognostics and Health Management Society, 2012
92
European Conference of Prognostics and Health Management Society 2012
4
ensemble model ( )mcom xϕ , whereas ( )xν is the value of the
noise affecting the measure of the test data x , not used for
training the model. Thus, nν 1.,..., ,trainingn N= and the
values sampled from ( )xν in the test data are different,
independent realizations of the same random variable.
Notice also that 2 ( )xν obeys a Chi-square 2 ( )xχ
distribution with 1 degree of freedom.
The term 2 ( )m xσ can be estimated according to the
procedure illustrated in the previous Section 3.1 whereas, being ( ) ( )avgz x xϕ− a zero mean random variable, its
variance is given by:
2[ ( ) ( )] = [( ( ) ( )) ]avg avgVar z x x E z x xϕ ϕ− − (11)
Thus, in correspondence of the training couples ( , ),n nx z
1,..., ,trainingn N= one can approximate ( )2( ) ( )avgE z x xϕ −
by ( )2( ) ( )avgz x xϕ− and obtain, according to Eq. 10, a
dataset formed by the pairs 2ˆ( , )n nx α , 1,..., trainingn N= ,
where:
2 2 2ˆ ˆ= ( ( )) ( ),0n n avg n nmax z x xα ϕ σ− − (12)
Finally, in order to estimate 2 ( )xα for a generic x , a single
ANN is trained using the dataset 2ˆ( , )n nx α , 1,..., .trainingn N=
3.3. Estimate of the measurement distribution P(z|x)
Being ( )avg xϕ an estimate of ( )f x , the measurement
distribution ( | ( ))P z f x can be approximated by the
distribution ( | ( ))avgP z xϕ which can be derived from the
distribution ( ( ) | ( ))avgP x f xϕ and the distribution of the
measurement noise ( )xν , according to Eq. 5. Since these
two distributions are both Gaussian, with means and variances estimated as shown in Sections 3.1 and 3.2,
( | ( ))P z f x is approximated by a Gaussian distribution with
mean ( )avg xϕ and variance 2 2ˆˆ ( ) ( )m x xσ α+ . Finally, being
( )f x invertible, the distribution of the measurement z in
correspondence of a given state x , ( | )P z x is given by:
2 2ˆˆ( | ) ( | ( )) ( ( ), ( ) ( ))avg mP z x P z f x N x x xϕ σ α≈ ≈ + (13)
4. CASE STUDY
In this Section, the technique previously described for estimating the measurement distribution ( | )P z x is applied
to a case study derived from Orchard & Vatchsevanos (2009), which deals with the crack propagation phenomenon
in a component subject to fatigue load. The system state is described by the vector 1 2( ) = ( ( ), ( ))x t x t x t , whose first
element, 1( ),x t indicates the crack depth whereas the second
element, 2( ),x t represents a time-varying model parameter
that directly affects the crack growth rate. The evolution of this degradation process is described by the following two equations, which form a Markovian system of order one:
4 31 1 2 1( 1) = ( ) 3 10 (0.05 0.1 ( )) ( )x t x t x t tω−+ + ⋅ + ⋅ + (14)
2 2 2( 1) = ( ) ( )x t x t tω+ + (15)
where 1( )tω is a Gaussian noise with mean 0.045 and
standard deviation 0.116, and 2( )tω is a zero mean
Gaussian noise with standard deviation 0.010.
In the present case study, the measurement equation is assumed to be unknown whereas a dataset formed by the
trainingN pairs 1,( , )n nx z 1,..., trainingn N= , is available, where
the subscript 1 refers to the first component of vector x(t).
In practice, given the purpose of the present work of showing the feasibility of the proposed approach, the dataset
1,( , ), 1,...,n n trainingT x z n N= = has actually been artificially
obtained by simulating the behavior of the degradation process ( ),x t and sampling from the probabilistic
measurement model (Orchard & Vachtsevanos 2009):
1 1 1 1( ) = ( ) ( ) ( ) 0.25 ( )z t f x x x t xν ν+ = + + (16)
where 1( )xν is a zero mean Gaussian noise, whose standard
deviation depends on 1x :
21 1 1
1 1 1[ ( )] =
120 10 2Std x x xν − + + (17)
According to Eq. 16, the function 1( ) ( )f x f x= is given by
x1+0.25, which is, as required by the method, an invertible function.
To conclude this Section, notice that the probabilistic measurement model in Eq.(9) has been intentionally taken simple, being the main interest of this work the quantification of the uncertainty in the RUL prediction and not the ensemble ability in reproducing the measurement equation. In this respect, the knowledge of the variance of the measurement noise is fundamental, as it determines the amplitude of the prediction intervals of the RUL estimates. Thus, the capability of correctly reconstructing the variance behavior plays a key role in the assessment of the potential of the proposed technique.
4.1. Estimate of the measurement distribution
According to the technique illustrated in Section 3, an ensemble of = 200B ANNs has been built using the
First European Conference of the Prognostics and Health Management Society, 2012
93
European Conference of Prognostics and Health Management Society 2012
5
available dataset 1,( , ), 1,..., ,n n trainingT x z n N= = where
1000trainingN = . Every ANN has 5 tan-sigmoidal hidden
neurons and one linear output neuron. To estimate 2 2
1( ) ( )m mx xσ σ= , the ensemble has been divided into
= 20M sub-ensembles and =1000P bagging resamples of the sub-ensemble outputs 1( ) ( ),m m
com comx xϕ ϕ=
1,..., ,m M= have been considered.
The results are evaluated in terms of the following performance indicators, which are computed by considering a set of = 1000testN pairs 1,( , )i ix z , 1,..., testi N= which
have been obtained from Eq. 16 and 17:
1. The square bias 2b ; i.e., the average quadratic difference between the true value of 1( )f x and the
ensemble estimate of this quantity 1( )avg xϕ :
2 21, 1,
=1
1= ( ( ) ( ))
Ntest
i avg iitest
b f x xN
ϕ−∑ (18)
This value gives information on the accuracy of the estimate of 1( ) ( )f x f x= provided by the
ensemble. Notice that the computation of this indicator requires the knowledge of the function
1( )f x , which is not available if the measurement
equation (Eq. 16) is not known. Thus, in general one can only compute:
21,
1
1( ( ) )
testN
i iitest
MSE x zN
ϕ=
= −∑ (19)
Small values of MSE indicate satisfactory performance of the ensemble.
2. The coverage of the Prediction Interval (PI) with confidence 0.68. This indicator is used to verify the accuracy of the estimate of the distribution
1( | ) ( | ).P z x P z x= A PI with a confidence level
pγ is defined as a random interval in which the
observation 1( ) ( )z x z x= will fall with probability
pγ (Carney et al. 1999, Heskes 1997):
1 1( ( ) ( ))p pP z x PI xγ γ∈ = (20)
Being the estimate of 1( | )P z x a Gaussian distribution with
mean 1( )avg xϕ and variance 2 21 1
ˆˆ ( ) ( )m x xσ α+ , the PI with
0.68pγ = is given by:
2 21 1 1 1
2 21 1 1 1
ˆˆ( ) ( ) ( ) ( )
ˆˆ( ) ( ) ( ) ( )
avg m
avg m
x x x z x
z x x x x
ϕ σ α
ϕ σ α
− + ≤
≤ + + (21)
In order to verify whether the estimate of 1( | )P z x provides
a satisfactory approximation of the true pdf, we will consider how many times the measurement iz falls within
the 0.68 1,( )p iPI xγ = . The closer to pγ the portion of points
hitting the pγ -confidence interval, the more accurate the
estimation of the parameters of the Gaussian pdf.
In practice, for every 1,ix , 1,..., testi N= , a counter iC is set
to 1 or 0 depending on whether the iz belongs or not to the
estimated 0.68 1,( )p iPI xγ = . The closer the average of ,iC
1,..., testi N= to 0.68, the better the approximation.
Cross-validation of the results has been done by repeating the computations with 25setN = different, randomly
generated training and test sets. This avoids over/under
estimations of the performance indicators 2b and coverage.
Table 1 reports means and standard deviations (std) of the performance indicators over the 25 cross-validations.
model Ensemble 1 ANN 2b 0.0040 ± 0.0015 0.0097 ± 0.0060
PI coverage 0.6758 ± 0.0366 - Table 1: Performance indicators over 25 cross-validations; the mean ± std is reported
Notice that the ensemble output 1( )avg xϕ is very accurate in
the prediction of the function 1( )f x , the bias being very
small. Furthermore, notice that the ensemble outperforms a single ANN trained with all the 1000 training patterns. With respect to the estimate of the distribution 1( | )P z x , the
proposed method provides a satisfactory approximation, being the coverage very close to 0.68.
Table 2 reports the estimates of the two contributions 2mσ
and 2α of the variance of the estimated measurement
distribution 1ˆ ( | )P z x . Notice that in this case study, 2mσ is
negligible with respect to the variance 2α of the measurement noise; this entails that the accuracy of the estimate of the PI is more sensible to the estimate of 2α .
In this respect, Figure 1 shows the estimate of 21( )xα and
compares it to the true 2α value provided by Eq. 17. Notice that this comparison, which is done in this work to assess the performance of the methodology, is not possible in real industrial applications if the measurement model (Eqs. 16 and 17) is not available.
Estimation Real
value
First European Conference of the Prognostics and Health Management Society, 2012
94
European Conference of Prognostics and Health Management Society 2012
6
2mσ +
2α 0.4489±0.1359 -
2mσ 0.0243±0.0317 -
2α 0.4886±0.0276 0.4900
Table 2: Contributions to the 1( | )P z x variance
Figure 1: True and approximated measurement noise variance 2
1( )xα
4.2. Crack depth prediction
The objective of this Section is to evaluate the performance of the overall scheme in the prediction of the crack depth evolution when the ensemble of ANNs is used to estimate the measurement distribution( | )P z x . To this purpose, the
problem tackled consists in predicting at 80t = (in arbitrary units) the future crack propagation, on the basis of eight measurements of the crack depth taken at time 10mt m= ⋅ ,
m=1,…, 8. This prediction phase is performed by considering the evolution of the particles according to the model in Eqs. 14 and 15 (e.g., see Orchard & Vachtsevanos 2009). In particular, we focus on the time instant t=80, when the PF updates via ( | )P z x the particles' weights after the
last measurement (z=4.6087 in arbitrary units) has been acquired.
Figure 2 shows the prediction of the crack depth evolution performed at t=80, after the acquisition of the last measurement, by using the ensemble model to estimate
( | )P z x . This prediction has been compared to that which
would be obtained by directly using the measurement equation in the PF.
Notice that the linearity of the prediction of the expected value of x1 can be explained by averaging Eqs. 14 and 15:
2 2 2 2[ ( 1) = [ ( )] [ ( )] [ ( )] constantE x t E x t E t E x tω+ + = =
4 31 1 2 1
1 1
[ ( 1)] [ ( )] 3 10 (0.05 0.1 [ ( )]) [ ( )]
[ ( 1)] [ ( )] constant
E x t E x t E x t E t
E x t E x t
ω−+ − = ⋅ + ⋅ ++ − =
Figure 2: Comparison of the predictions with the true state evolution
To evaluate the impact of replacing the measurement equation with the ensemble of ANNs, 100runN = different
degradation trajectories have been simulated and the predictions of the crack depth have been performed.
Also in this case, the prediction provided by the ensemble of ANNs trained with 1000trainingN = patterns has been
compared to that based on the analytical measurement equation ( | )P z x . Each run is characterized by the same
true trajectory, the same acquired measures and the same state noise vector. The following performance indicators have been computed:
1. The coverage of the PI, with confidence 0.68. In particular, the prediction of the crack depth at
120t = has been considered. At each run, the boundaries of the PI are computed by considering the 16th and 84th percentiles of the estimate of the pdf of the crack depth. A counter is set to 1 or 0 if the true trajectory belongs or not to the corresponding interval, in analogy with the coverage verification explained in Section 4.2.
2. The average width over the 100runN = runs of the
PI at 120t = .
3. The Mean Square Error (MSE) over the 100runN = runs between the prediction of the
crack depth provided by using the PF and its true value at 120t = . That is:
( )2
1201
1 run
run run
run
N
n nnrun
MSE X oN =
= −∑ (22)
80 85 90 95 100 105 110 115 1204
5
6
7
8
9
10
time
crac
k de
pth
[inch
]
1000 trainingtraditionaltrue
First European Conference of the Prognostics and Health Management Society, 2012
95
European Conference of Prognostics and Health Management Society 2012
7
where runnX is the true crack depth in the test
trajectory at 120t = and runno is the expected value
of the crack depth pdf estimated by the PF.
The obtained values are reported in Table 4. It can be noticed that the coverage of the ensemble is very close to 0.68; furthermore, even the other performance indicators are very close to those which would be obtained by considering the measurement equation. This result confirms that the approximation of the distribution ( | )P z x is accurate and therefore it does not remarkably alter the outcome of the PF.
Traditional Data-driven
coverage 0.6500 0.7000
PI width 1.3058 1.3226
MSE 0.3421 0.3464 Table 4: Performance indicators at t=120
Finally, the performance evaluator s proposed by Saxena et al. 2008 has been computed to evaluate the prediction performance:
1
2
1
1
1 0
1
i
i
dn
ai
i
dn
a
i
e if d
s
e otherwise
−
=
−
=
− <=
−
∑
∑
where a1= 10, a2=13, n=100 is the number of simulated histories and d is the difference between the estimated RUL and its true value. To compute the value of this performance metric, the following procedure has been adopted:
1. Set the failure threshold to ST=7. 2. Simulate the evolution of the degradation process;
this allows calculating the true value of the true RUL tRUL at t=80 as the difference between the time instant at which the component achieves ST and 80. Moreover, the set of measures sampled according to the measurement model are collected.
3. Use the PF to estimate the component degradation
state at t=80 and predict the RUL RULt .
4. Calculate the difference RUL RULd t t= − .
5. Perform n-1 times the steps 2-4 and compute s.
The values of the metric s obtained in the case in which the RUL is predicted by using the ‘traditional’ PF approach (s=10.30) and the ‘data–driven’ approach (s=10.65) are very close to each other.
5. CONCLUSIONS
PF is often proposed as prognostic technique for estimating the evolution of the degradation state x of a system;
generally, it resorts to analytical models of both degradation state evolution and measurement. In practice, the measurement model may not be available in an analytical form; rather, there may be available a set of data which allows, through data-mining techniques, building the measurement model. In this work, a technique based on an ensemble of ANNs has been investigated to this aim and applied to a case study derived from the literature. The verification conducted on the results shows that a good approximation of the model may be obtained and its substitution in the PF does not significantly affect its performance. Furthermore, the proposed method has been shown capable of estimating the uncertainty on the RUL prediction.
Additional effort will be dedicated in future works to improve the accuracy of the estimate when only a small training set is available and to extend the applicability of the technique also in those cases in which the measurement equation ( )f x is not biunivocal or has a more complex
form. Furthermore, another future objective is the substitution also of the model of the evolution of the system state with a data-driven model, e.g., an ensemble of trained ANNs, in order to allow the usage of PF in those cases where also an analytical model of the evolution of the system is unavailable.
REFERENCES
Arulampalam, M.S., Maskell, S., Gordon, N. and Clapp, T. (2002). A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking, IEEE Transactions on Signal Processing, 50 (2), 174-188.
Baraldi, P., Di Maio, F. Zio, E., Sauco, S. Droguett, E., Magno, C. (2012) Ensemble of Neural Networks for Predicting Scale Deposition in Oil Well Plants Equipments, proceedings of PSAM 11 & ESREL 2012.
Breiman, L. (1999) Combining predictors, in Sharkey AJC (Ed.) Combining artificial neural nets: ensemble and modular multinet systems. Springer, Berlin Heidelberg New York, pp 31-50.
Cadini, F., Zio, E., Avram, D. (2009) Model-based Monte Carlo state estimation for condition-based component replacement, Reliability Engineering & System Safety, Vol. 94 (3), pp. 752-758.
Carney, J., Cunningham, P., & Bhagwan, U. (1999). Confidence and prediction intervals for neural network ensembles. International Joint Conference on Neural Networks IJCNN, July 10-16, Washington D.C.
Coble, J.B. (2010) Merging Data Sources to Predict Remaining Useful Life – An Automated Method to Identify Prognostic Parameters. PhD diss., University of Tennessee.
First European Conference of the Prognostics and Health Management Society, 2012
96
European Conference of Prognostics and Health Management Society 2012
8
Doucet, A., de Freitas, J.F.G. and Gordon, N.J. (2001) Sequential Monte Carlo methods in practice. Springer-Verlag, New York.
Gustaffson, F., & Saha, S. (2010). Particle filtering with dependent noise. In Proceedings of the 13th Conference on Information Fusion (FUSION). Edinburgh.
Heskes, T. (1997) Practical Confidence and Prediction Intervals, in M. Mozer, M. Jordan and T. Peskes, editors, Advances in Neural Information Processing Systems,vol 9, pages 466-472, Cambridge, 1997, MIT Press.
Hsu, C.W. , Chang, C.C., Lin, C.J. (2003) A Practical Guide to Support Vector Classification. Technical Report, 2003.
Liu, R., Ma, L., Kang, R. and Wang, N. (2011) The Modeling Method on Failure Prognostics Uncertainties in Maintenance Policy Decision Process, Proc. 9th Int. Conf. on Reliability, Maintainability and Safety (ICRMS 2011), pp. 815-820.
Moura, M.C., Lins, I.D., Ferreira, R.J., Droguett, E.L., Jacinto, C.M.C. Predictive maintenance policy for oil well equipment in case of scaling through support vector machines, in Proceedings of the European Safety and Reliability Conference-ESREL 2011, pp. 503-507.
Nix, D. and Weigend, A. (1994). Estimating the mean and the variance of the target probability distribution," in IEEE world congress on computational intelligence, International Joint Conference on Neural Networks, June 27-July 2, Orlando, Florida.Vol 1, pp. 55-60.
Orchard, M., Wu, B., Vachtsevanos, G. (2005) A Particle Filter Framework for Failure Prognosis, Proceedings of WTC2005 World Tribology Congress III. Washington D.C., USA, Sept. 12-16, 2005
Orchard, M. and Vachtsevanos, G. (2009) A Particle Filtering Approach for On-Line Fault Diagnosis and Failure Prognosis, Transactions of the Institute of Measurement and Control, Vol. 31 (3-4), pp. 221-246.
Papoulis, A. and Pillai, S.U. (2002) Probability, Random Variables and Stochastic Processes. McGraw-Hill Higher Education, 4th edition.
Saxena, A., Goebel K, Simon, D., Eklund, N. (2008) Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation, in International Conference on Prognostics and Heath Management, PHM2008.
Stuart, G., Bienenstock, E. and Doursat, R. (1992) Neural Networks and the bias/variance dilemma. Neural Computation, Vol. 4(1), pp. 1-58.
Tang, L., Kacprzynski, G.J., Goebel, K. and Vachtsevanos, G. (2009) Methodologies for Uncertainty Management in Prognostics. Proc. IEEE Aerospace conference, 2009, pp. 1-12.
Vachtsevanos, G., Lewis, F.L., Roemer, M., Hess, A. and Wu, B. (2006) Intelligent Fault Diagnosis and Prognosis for Engineering Systems. John Wiley & Sons.
Zio. E. (2012) Prognostics and health management of industrial equipment, in Diagnostics and Prognostics of Engineering Systems: Methods and Techniques, S. Kadry, Eds. IGI-Global, 2012.
BIOGRAPHIES
Piero Baraldi (BS in nuclear engng., Politecnico di Milano, 2002; PhD in nuclear engng., Politecnico di Milano, 2006) is assistant professor of Nuclear Engineering at the department of Energy at the Politecnico di Milano. He is the current chairman of the European Safety and Reliability Association, ESRA, Technical Committee on Fault Diagnosis. His main research efforts are currently devoted to the development of methods and techniques for system health monitoring, fault diagnosis, prognosis and maintenance optimization. He is co-author of 42 papers on international journals and 38 on proceedings of international conferences, and serves as referee of 5 international journals.
Michele Compare (BS in mechanical engng., University of Naples Federico II, 2003, PhD in nuclear engng., Politecnico di Milano, 2011) is currently a post-doc at the Politecnico di Milano. He worked as RAMS engineer and risk manager. His main research efforts are devoted to the development of methods and techniques in support of maintenance of complex systems.
Sergio Sauco BS in Energy engng. 2009, Politecnico di Milano; MS in Nuclear engng., Politecnico di Milano, 2011.
Enrico Zio (BS in nuclear engng., Politecnico di Milano, 1991; MSc in mechanical engng., UCLA, 1995; PhD, in nuclear engng., Politecnico di Milano, 1995; PhD, in nuclear engng., MIT, 1998) is Director of the Chair in Complex Systems and the Energetic Challenge of Ecole Centrale Paris and Supelec, full professor, Rector's delegate for the Alumni Association and past-Director of the Graduate School at Politecnico di Milano, adjunct professor at University of Stavanger. He is the Chairman of the European Safety and Reliability Association ESRA, member of the Korean Nuclear society and China Prognostics and Health Management society, and past-Chairman of the Italian Chapter of the IEEE Reliability Society. He is serving as Associate Editor of IEEE Transactions on Reliability and as editorial board member in various international scientific journals. He has functioned as Scientific Chairman of three International Conferences and as Associate General Chairman of two others. His research topics are: analysis of the reliability, safety and security of complex systems under stationary and dynamic conditions, particularly by Monte Carlo simulation methods; development of soft computing techniques for safety, reliability and maintenance applications, system monitoring, fault diagnosis and prognosis. He is author or co-author of five international books and more than 170 papers on international journals.
First European Conference of the Prognostics and Health Management Society, 2012
97
Feature Extraction and Evaluation for Health Assessment andFailure Prognostics
K. Medjaher1, F. Camci2, and N. Zerhouni1
1 FEMTO-ST Institute, AS2M Department, UMR CNRS 6174-UFC/ENSMM/UTBM, 25000 Besancon, [email protected]
2 IVHM Centre School of Applied Sciences Cranfield University, [email protected]
ABSTRACT
Abstract - The estimation of Remaining Useful Life (RUL)of industrial equipments can be realized on their most criticalcomponents. Based on this assumption, the identified criticalcomponent must be monitored to track its health state duringits operation. Then, the acquired data are processed to extractrelevant features, which are used for RUL estimation.This paper presents an evaluation method for the goodnessof the features, extracted from raw monitoring signals, forhealth assessment and prognostics of critical industrial com-ponents. The evaluation method is applied to several simu-lated datasets as well as features obtained from a particularapplication on bearings.
1. INTRODUCTION
The availability, reliability and security of industrial equip-ments can be ensured by monitoring their most critical com-ponents to continuously assess their health condition and pre-dict their future one leading to maintenance, life cycle andcost optimization. Examples of critical physical componentscan be bearings, gears, batteries, belts, etc. Bearings failure isconsidered as the one of the foremost causes of breakdown inrotating machinery (Li et al., 1999). Bearing faults accountfor the 40% of motor faults according to the research con-ducted by Electric Power Research Institute (EPRI) (Enzo &Ngan, 2010). Turbine engine bearing failures are the lead-ing cause of class-A mechanical failures (loss of aircraft)(Richard, 2005). Even one aircraft saved with prognosticswould pay its development cost (Marble & Morton, 2006).The identification of the most convenient time of maintenanceafter failure detection without reducing the safety require-ments is crucial, which is possible with prognostics capabil-ity. Thus, bearing prognostics is very critical for effective
K. Medjaher et.al. This is an open-access article distributed under the termsof the Creative Commons Attribution 3.0 United States License, which per-mits unrestricted use, distribution, and reproduction in any medium, providedthe original author and source are credited.
operation and management.Failure detection forces machinery to shut down that causestremendous time, productivity and capital loss. In addition,it is not uncommon to replace a defected/used bearing witha new one that has shorter remaining useful life than the de-fected one. Each failure type (outer race, inner race, ball andcage defects) causes a distinct signature in the vibration fre-quency (Enzo & Ngan, 2010) and vibration analysis is con-sidered as the most reliable method in bearing failure detec-tion (Zhang, Sconyers, Patrick, & Vachtsevanos, 2010; Da-vaney & Eren, 2004; McFadden & Smith, 1984; Tandon &Choudhury, 1999). However, it is often difficult to extractthe failure signature due to the noise in the data especially inearly stages of the failure (Su, Wang, Zhu, Zhang, & Guo,2010; Bozchalooi & Liang, 2008; He, Jiang, & Feng, 2009).These features are then used to do failure detection, diagnos-tic and prognostic.Feature extraction is the common step in all types of prognos-tic approaches and one of the most critical steps in diagnosticsand prognostics. The extracted features are first evaluated andthen used by appropriate methods and algorithms to detect thefaults and to predict the equipment’s remaining useful life. Inthis framework, the goodness of the features affects the com-plexity of the diagnostic and prognostic methods. Featuresthat represent healthy, close to failure machinery and theirprogression perfectly may lead to very simple diagnostic andprognostic methods. On the other hand, very complex diag-nostic and prognostic methods using features that are ineffec-tive in representation of failure and failure progression maylead to poor results. Thus, extraction of relevant features is apre-requisite for effective diagnostics and prognostics.This paper presents an evaluation method for the goodnessof the features for prognostics. An effective feature evalua-tion method will achieve the selection of best features, whichis critical for obtaining better prognostics results. The fea-ture evaluation method is applied to bearings that were rununtil failure in a lab environment. The paper is organized
1
First European Conference of the Prognostics and Health Management Society, 2012
98
European Conference of the Prognostics and Health Management Society, 2012
as follows: section 2 presents a brief introduction to failureprognostic, section 3 deals with the quantification metric forthe quality evaluation of features for prognostics, section 4presents results and experiments and finally section 5 con-cludes the paper.
2. FAILURE PROGNOSTIC PARADIGM
According to the International Standard Organization (ISO),failure prognostics corresponds to the estimation of the op-erating time before failure and the risk of future existence orappearance of one or several failure modes (AFNOR, 2005).In the scientific literature, the operating time before failureis called remaining useful life (RUL) for which a confidencevalue is often associated.Several methods and tools for performing failure prognosticsare proposed in the literature. This material can be groupedinto three main approaches (Tobon-Mejia, Medjaher, & Zer-houni, 2012; Heng, Zhang, Tan, & Mathew, 2009; Jardine,Lin, & Banjevic, 2006; Vachtsevanos, Lewis, Roemer, Hess,& Wu, 2006), namely: model-based approach, data-drivenapproach and hybrid approach.
Model-based (also called physics of failure) methods deal
Prognostic
Model-based approach(physics of failure) Data-driven approach Hybrid approach
Figure 1. Main prognostic approaches.
with the exploitation of a mathematical model representingthe behavior of the physical component including its degrada-tion. The derived model is then used to predict the future evo-lution of the degradation. In this case, the prognostic consistsin evolving the degradation model until a determined futureinstant from the actual deterioration state and by consideringthe future use conditions of the corresponding component.The main advantage of this approach is its precision, sincethe predictions are achieved based on a mathematical modelof the degradation. However, the derived degradation modelis specific to a particular kind of component or material, andthus, can not be generalized to all the system components. Inaddition to that, getting a mathematical model of degradationis not an easy task and needs well instrumented test-bencheswhich can be expensive.Data driven methods are concerned with the transformationof the monitoring and/or the exploitation data into relevantmodels, which can be used to assess the health state of theindustrial system and predict its future one leading to the esti-mation of its RUL. Generally, the raw data are first processedto extract features which are then used to build the diagnosticand prognostic models. The features can be temporal, fre-
quency or both. In same applications, individual features arenot sufficient and one needs to combine them in order to buildwhat can be called health indicators. Note that data-drivenprognostics methods can use data provided by sensors or ob-tained through experience feedback (operation, maintenance,number of breakdowns, etc.).The advantage of data-driven approach is its applicability,cost and implementation. Indeed, by these methods, it ispossible to predict the future evolution of degradation with-out any need of prior mathematical model of the degradation.However, the results obtained by this approach are less pre-cise than those obtained by using model-based methods.Hybrid methods use both data-driven and model-based (orphysics of failure) approaches. The application of each ap-proach depends on the application and on the type of knowl-edge and data available.
3. FEATURE EXTRACTION AND EVALUATION
Fault detection, diagnostic and prognostic activities all usethe notion of features, which are extracted from the raw mon-itoring signals provided by the sensors (temperature, vibra-tion, force, etc.) installed on the system. Feature extraction isprimordial in the process of health monitoring, health assess-ment and failure prognostic. Indeed, the relevant informationwhich is related to the behavior of the component during itsdegradation is often hidden in the raw signals and needs tobe extracted by means of appropriate methods. The figure 2shows the steps involved in the process failure prognostic in-cluding feature extraction.
Diagnostics is a classification problem, whereas the prog-
Critical component to monitor
Parameters to measure
Adequate Sensors
Data processing and feature extraction
Feature evaluation
Health assessment
and prognostics
RUL
Figure 2. Steps for RUL estimation.
nostics is the process of forecasting the future health states.The goodness of the features for diagnostics is basically ameasure of separability between data from healthy and faultyequipment. Good separability indicates that samples fromdifferent classes (i.e., healthy and faulty) are far apart fromeach other and samples from the same class are close to eachother. The key point in prognostics is the continuity of theseparation between time segments, whereas diagnostics fo-cus on one separability measure between two static classes(i.e., failed and healthy). However, prognostics searches forseparation between time segments for the whole the compo-nent where prognostics is aimed. Within class separability(parameters a and b in Fig 3 (Camci, Medjaher, Zerhouni, &Nectoux, 2012)) and between class separability (parameter cin Fig 3 (Camci et al., 2012)) are used to quantify the sepa-rability. Many class separation metrics have been reported inthe literature (Calinski & Harabasz, 1974; Eker et al., 2011).These metrics focus on static classes; do not consider pro-
2
First European Conference of the Prognostics and Health Management Society, 2012
99
European Conference of the Prognostics and Health Management Society, 2012
Figure 3. Feature quality for diagnostics and prognostics.
gression from one class to another. One feature may be goodat separation of the classes, but not at representation of pro-gression from one class to another. For example, separabilitymeasure (S2) of feature 2 (F2) is higher than in separabilitymeasure (S1) of feature 1 (F1) in Fig 3 (Camci et al., 2012).However, this does not mean that F2 is better in representingthe failure progression. As seen from the figure, failure pro-gression in F2 involves higher variation. Thus, a new qualitymeasure should be employed for prognostics, which is a rel-atively new problem.Monotonically non-increasing or non-decreasing: Math-ematically, a function f is called monotonically increasing(monotonically non-decreasing), if for all x and y such thatx ≤ y one has f (x) ≤ f (y) (f (y) ≤ f (x)).It may be trivial to check the monotonicity for a single fail-ure progression sample by analyzing the difference betweenconsecutive points. When all the difference values are greater(less) than or equal to 0, then the function is defined as non-decreasing (non-increasing). However, monotonicity over allsamples representing failure progression should be consid-ered rather than individual analysis of samples. Example ofseveral samples representing failure progression is displayedin Fig. 4 (Camci et al., 2012). As seen from the figure, thetime is segmented for effective analysis of the failure progres-sion. The effectiveness of a feature to represent the failureprogression is calculated as the average separability of seg-ments as represented in (1). The higher the total separabilityvalue (S) is, the better representation of the failure progres-sion. Thus, the goal is to find the feature that has the highestS value. S basically is the average separation between timesegments. High S value indicates that the difference betweentime segments are high. st value is the separability measurefor consecutive time segments.
S =
T∑t=1
st
T(1)
where S is the average separability value, st is the separabil-ity at time t and T is the total number of time segments.The distribution of the data points from different samples ineach time segment should be used to measure the separabil-
t1 t2 t3 t4 t5 t6 t7
t8 t9 t10
Figure 4. Failure progression for multiple samples.
ity at a given time segment. The separability calculation isformulated in (2).
st =a
L− χ
Nt(2)
with
χ =
0 if a
L 6= 1α if a
L = 1(3)
where α is the number of samples overlapping with the dis-tribution in consecutive time frame, Nt is the number of sam-ples in time segment t and L represents the distance between25th and 75th percentiles. The 25th and 75th percentiles wereselected as a common sense to select the range to be ableto capture the 50% of the data. The selection of the rangemay depend on signal to noise ratio and possible bias in thedataset.The ratio of the length of the non-overlapped portion (calleda) to L is a measure of the separability (a/L). The L and aparameters represent the distance between points in the givenpercentile. For example, if the overlapping occurs between30th and 50th percentile, parameter a is the distance betweensamples in 30th and 50th percentile. When the separationis low, a/L ratio will be close to 0. When the separation ishigh, a/L becomes closer to 1. When there is no overlap be-
3
First European Conference of the Prognostics and Health Management Society, 2012
100
European Conference of the Prognostics and Health Management Society, 2012
tween 25-75 percentiles of the distributions (a/L=1), thereexist two different possibilities. In the first one, there is someoverlap within data greater than 75th percentile or less than25th percentile. The second one represents complete separa-tion. When a/L becomes 1, then the ratio of number of datapoints causing overlap to the total number of data points inthe distribution is subtracted in separability calculation.
4. EXPERIMENTS AND RESULTS
4.1. Simulated Dataset
The presented evaluation method is applied to eight simulateddatasets. These datasets have been developed to simulate var-ious levels goodness for prognostics. The features with cleartrend are considered to be good feature, whereas bad featuresdo not include a trend with time. The datasets numbered fromone to eight include increasing trend as shown in Figure 5.
200 400 600 800 10002
2.5
3
3.5
4
Feature 1
200 400 600 800 10002
2.5
3
3.5
4
Feature 2
200 400 600 800 10002
2.5
3
3.5
4
Feature 3
200 400 600 800 10002
2.5
3
3.5
4
Feature 4
200 400 600 800 10002
2.5
3
3.5
4
Feature 5
200 400 600 800 10002
2.5
3
3.5
4
Feature 6
200 400 600 800 10002
2.5
3
3.5
4
Feature 7
200 400 600 800 10002
2.5
3
3.5
4
Feature 8
Figure 5. Simulated Features.
The trend in these datasets are formulated as logarithmicallyincreasing mean with constant noise and shown in theformulation below. In these equations µi,t is the mean offeature i in time t and T is the final time point.
x(t) = µi,t + σ (4)
µi,1 = log(10) (5)
µi,T = log(i× 10) (6)
As seen from the Figure 5, the goodness of the features in-creases from feature 1 to feature 8. The trend in the laterfeatures can be seen better in later datasets. Figure 6 displaysthe goodness of features obtained with the presented evalua-tion metric. As seen from the figure, the goodness increasesin the later features, which supports the increasing trend inFigure 5.
1 2 3 4 5 6 7 80
1
2
3
4
5
6
7 Goodness of features
Figure 6. Goodness of features.
4.2. Bearing Example
The accelerated bearing life test bed is called PRONOSTIA,which it is an experimentation platform dedicated to test andto validate bearing health assessment, diagnostic and prog-nostic. In the present experimental setup a natural degra-dation process of bearings is performed. During the exper-iments any failure types (inner race, outer race, ball, or cage)or their combinations could occur. This is allowed in the sys-tem to better represent a real industrial situation.The experimental platform PRONOSTIA is composed of twomain parts: a first part related to the speed variation and asecond part dedicated to load profiles generation. The speedvariation part is composed of a synchronous motor, a shaft,a set of bearings and a speed controller. The synchronousmotor develops a power equal to 1.2 kW and its operationalspeed varies between 0 and 6000 rpm. The second part iscomposed of a hydraulic jack connected to a lever arm allow-ing to create different loads on the bearing mounted on theplatform for degradation.A pair of ball bearings is mounted on one end of the shaftto serve as the guide bearings and a NSK6307DU roller ball
4
First European Conference of the Prognostics and Health Management Society, 2012
101
European Conference of the Prognostics and Health Management Society, 2012
bearing is mounted on the other end to serve as the test bear-ings. The transmission of the movement between the motorand the shaft drive is coupled by a rub belt.Two high frequency accelerometers (DYTRAN 3035B) aremounted horizontally and vertically on the housing of the testroller bearing to pick up the horizontal and the vertical ac-celerations. In addition, the monitoring system includes onetemperature probe (of type PT100) to record the temperatureof the tested bearing. A speed sensor and a torque sensor arealso available on the PRONOSTIA platform. The samplingfrequency of the NI DAQCard-9174 data acquisition card isset to 25600 Hz and the vibration data provided by the twoaccelerometers are collected every 1 second.The bearing operating conditions are determined by instan-taneous measures of the radial force applied on the bearing,the rotation speed of the shaft handling the bearing and of thetorque inflicted to the bearing.Several features are extracted to be used for failure progres-sion such as maximum, mean, standard deviation, skewness,kurtosis, root mean square error (RMS), crest factor and high-est frequency.Fig 7 displays two good (RMS and standard deviation); twobad features (Skewness and crest factor) for prognostics (inthese plots, the x axis stands for time). As you can see fromthe figures, failure progression can be seen in the featureswith high separability measure.Fig 8 displays the separability values of several features. In
0 0.5 1 1.5 2 2.5
x 104
0.2
0.3
0.4
0.5
0.6
0.7RMS
0 0.5 1 1.5 2 2.5
x 104
0.1
0.2
0.3
0.4
0.5Standard Deviation
0 0.5 1 1.5 2 2.5
x 104
-0.4
-0.2
0
0.2
0.4
0.6Skewness
0 0.5 1 1.5 2 2.5
x 104
0
5
10
15
20Crest Factor
Figure 7. Examples of good/bad features for prognostics.
this figure three sensory signals were used each is representedby a line in the graph. The fluctuations show that the good-ness may vary based on the sensory signal used. As seen fromthis figure, the goodness of skewness and crest factor (CF) arelow, whereas the goodness of standard deviation and RMS arehigh. Thus, the evaluation method is able to differentiate the
goodness of the features.
max mean stdev skew kurtosis rms CF rms1 rms2 CF1 CF2 freq0
1
2
3
4
5Vibration Signal 1
max mean stdev skew kurtosis rms CF rms1 rms2 CF1 CF2 freq0
1
2
3
4
5Vibration Signal 2
Figure 8. Separability values for the second type of degrada-tion.
5. CONCLUSION
The quality of the features is critical for health assessment, di-agnostics and prognostics. Feature extraction, selection andevaluation of the quality of features in diagnostics has beenstudied extensively. The nature of the prognostics problemis different from diagnostics. This paper presents quantifica-tion metric for evaluation of the quality of features for prog-nostics, which is relatively new problem compared to diag-nostics. The presented metric is applied to features extractedfrom bearing vibration data collected in a lab environment.The features are plotted for visual evaluation to judge thequality of the evaluation metric. The results show that themetric is able to effectively quantify the quality of featuresfor the purpose of prognostics.
REFERENCES
AFNOR. (2005). Condition monitoring and diagnostics ofmachines - Prognostics - Part 1: General guidelines.NF ISO 13381-1.
Bozchalooi, I., & Liang, M. (2008). A joint resonance fre-quency estimation and in-band noise reduction methodfor enhancing the detectability of bearing fault signals.Mechanical Systems and Signal Processing, 22, 915-933.
Calinski, R., & Harabasz, J. (1974). A Dendrite Method forCluster Analysis. Comm. in Statistics, 3, 1-27.
Camci, F., Medjaher, K., Zerhouni, N., & Nectoux, P. (2012).Feature Evaluation for Effective Bearing Prognostics.Quality and Reliability Engineering International. (Inpress)
5
First European Conference of the Prognostics and Health Management Society, 2012
102
European Conference of the Prognostics and Health Management Society, 2012
Davaney, M., & Eren, L. (2004). Detecting Motor BearingFaults. IEEE Instrumentation and Measurement Mag-azine, 30-50.
Eker, O., Camci, F., Guclu, A., Yilboga, H., Sevkli, M., &Baskan, S. (2011). A Simple State- based PrognosticModel for Railway Turnout Systems. IEEE Transac-tions on Industrial Electronics, 58(5), 1718-1726.
Enzo, C. L., & Ngan, H. W. (2010). Detection ofMotor Bearing Outer Raceway Defect by WaveletPacket Transformed Motor Current Signature Analysis.IEEE Transactions on Instruments and Measurement,59(10), 2683-2690.
He, W., Jiang, Z., & Feng, K. (2009). Bearing fault detectionbased on optimal wavelet filter and sparse code shrink-age. Measurement, 42, 1092-1102.
Heng, A., Zhang, S., Tan, A. C., & Mathew, J. (2009). Rotat-ing machinery prognostics: State of the art, challengesand opportunities. Mechanical Systems and Signal Pro-cessing, 23(3), 724 - 739.
Jardine, A. K., Lin, D., & Banjevic, D. (2006). A review onmachinery diagnostics and prognostics implementingcondition-based maintenance. Mechanical Systems andSignal Processing, 20(7), 1483 - 1510.
Li, Y., Billington, S., Zhang, C., Kurfess, T., Danyluk, S., &Liang, S. (1999). Adaptive Prognostics For RollingElement Bearing Condition. Mechanical Systems andSignal Processing, 13(1), 103-113.
Marble, S., & Morton, B. (2006). Predicting the RemainingUseful Life of Propulsion System Bearings. In Pro-ceedings of the 2006 IEEE Aerospace Conference. Big
Sky, MT, USA.McFadden, P., & Smith, J. (1984). Vibration monitoring of
rolling element bearings by the high frequency reso-nance technique - a review. Tribology International,17, 3-10.
Richard, A. W. (2005). A Need-focused Approach to AirForce Engine Health Management Research. In HealthManagement Research IEEE Aerospace Conference.Big Sky, Montana, USA.
Su, W., Wang, F., Zhu, H., Zhang, Z., & Guo, Z. (2010).Rolling element bearing faults diagnosis based on opti-mal Morlet Wavelet filter and autocorrelation enhance-ment. Mechanical Systems and Signal Processing, 24,1458-1472.
Tandon, N., & Choudhury, A. (1999). A review of vibrationand acoustic measurement methods for the detection ofdefects in rolling element bearings. Tribology Interna-tional, 32, 469-480.
Tobon-Mejia, D., Medjaher, K., & Zerhouni, N. (2012). CNCmachine tool’s wear diagnostic and prognostic by usingdynamic Bayesian networks. Mechanical Systems andSignal Processing, 28, 167 - 182.
Vachtsevanos, G., Lewis, F. L., Roemer, M., Hess, A., & Wu,B. (2006). Intelligent fault diagnosis and prognosis forengineering systems. Wiley.
Zhang, B., Sconyers, C., Patrick, M. O. andR., & Vachtse-vanos, G. (2010). Fault Progression Modeling: An Ap-plication to Bearing Diagnosis and Prognosis. In Pro-ceedings of American Control Conference. MD USA.
6
First European Conference of the Prognostics and Health Management Society, 2012
103
Finite Element based Bayesian Particle Filtering for the estimation
of crack damage evolution on metallic panels
Sbarufatti C.1, Corbetta M.
2, Manes A
3. and Giglio M.
4
1,2,3,4Politecnico di Milano, Mechanical Dept., Via La Masa 1, 20156, Milano, Italy
ABSTRACT
A lot of studies are nowadays devoted to structural health
monitoring, especially inside the aeronautical environment.
In particular, focusing the attention on metallic structures,
fatigue cracks represent both a design and maintenance
issue. The disposal of real time diagnostic technique for the
assessment of structural health has led the attention also
toward the prognostic assessment of the residual useful life,
trying to develop robust prognostic health management
systems to assist the operators in scheduling maintenance
actions. The work reported inside this paper is about the
development of a Bayesian particle filter to be used to refine
the posterior probability density functions of both the
damage condition and the residual useful life, given a prior
knowledge on damage evolution is available from
NASGRO material characterization. The prognostic
algorithm has been applied to two cases. The former
consists in an off-line application, receiving diagnostic
inputs retrieved with manual structure scanning for fault
identification. The latter is used on-line to filter the input
coming from a real-time automatic diagnostic system. A
massive usage of FEM simulations is used in order to
enhance the algorithm performances.
1. INTRODUCTION
Fatigue crack nucleation and propagation is a major issue
when considering aeronautical structures, both from a
design (Schmidt & Schmidt-Brandecker, 2009) and
maintenance points of view (Lazzeri & Mariani, 2009).
From one hand, a proper design is required in order to
guarantee the structure damage tolerance or the safe life,
depending on the criticality of the selected component.
From the other hand, a strict inspection schedule has to be
programmed in order to guarantee structural health, due to
the uncertainties in the design assumptions for damage
nucleation and evolution (material non-uniformities,
manufacturing tolerances, not easily predictable load
spectrum, uncertainty in stress field knowledge in hot spots,
etc.). Moreover, maintenance stops often require
dismounting large portions of structure, thus reducing the
availability of the aircraft and raising the operative costs.
Real time Structural Health Monitoring (SHM), as part of a
complete Prognostic Health Management system (PHM),
could potentially reduce the aircraft operative costs, while
maintaining a high level of safety (Boller, 2001). A lot of
research is thus directed to the development of systems for
automatic fault detection, able to perform a continuous on-
board inference on structural health. The evolution of
Diagnostic Monitoring Systems (DMS) has led to the
recognition that predictive prognosis is both desired and
technically possible. As a matter of fact, the availability of a
huge amount of data coming from DMS, once statistically
treated, would allow for a stochastic estimation of the
structure Residual Useful Life (RUL) as well as for the
estimation of the Probability Density Function (PDF)
relative to the current damage state. The approach would
allow deciding in real time whether a component must be
substituted or repaired, according to some predefined safety
parameters.
Bayesian updating methodologies perfectly fit the PHM
target (Arulampalam, Maskell, Gordon & Clapp, 2002).
Their approach consists in updating the a priori information
on RUL (based essentially on material characteristics)
according to the actual observations (treated stochastically)
taken in real time by the DMS, thus coming to the
estimation of the posterior required distributions,
conditional on the measures. Unfortunately, it is impossible
to analytically evaluate these posterior distributions apart
from the cases when the degradation process is linear and
the noise is Gaussian (like happens when using Kalman
Filters). Focusing on fatigue damage, being crack evolution
_____________________
Sbarufatti C. et al. This is an open-access article distributed under the terms
of the Creative Commons Attribution 3.0 United States License, which
permits unrestricted use, distribution, and reproduction in any medium,
provided the original author and source are credited.
First European Conference of the Prognostics and Health Management Society, 2012
104
European Conference of Prognostics and Health Management Society 2012
2
not a linear process and all the involved uncertainties
(comprehending also the measure error) not Gaussian, a
numerical approach is suggested. Monte Carlo Sampling
(MCS) methods are a valid tool to approximate the required
posterior distributions (Cadini, Zio & Avram, 2009).
Among them, Particle Filters, also known as Sequential
Importance Sampling (SIS) are a MCS method taking its
name from the fact that the continuous distributions of
interest are approximated by a discrete set of weighted
particles, each one representing a Markov Process trajectory
of evolution in the state space, being its weight an index of
probability of the trajectory itself (Arulampalam et al.,
2002). It is however important to consider that, though as
the number of samples becomes very large, the MCS
characterization of the PF approaches the optimal Bayesian
estimate. In addition, Sequential Importance Resampling
(SIR) algorithm is a similar technique which allows for
particle resampling when the initially drawn samples are not
able to describe with sufficient accuracy the system
dynamics. In this case, new particles are usually sampled
taking into account the information about the system gained
up to the resampling instant.
It is however important to consider the two main differences
raising when considering real time DMS based upon a
network of sensors installed over the structure with respect
to classical Non Destructive Technologies (NDT) used to
manually scan the structure during maintenance stops
(scheduled or unscheduled). The first point is related to the
target damage dimension that can be identified. NDTs can
detect cracks at a very early stage of propagation, often
detecting anomalies in the length order of 1mm or less. On
the other hand, the on-board DMS is expected to be
designed for a longer target crack length (typically an order
of magnitude greater, however strictly dependent on the
allowed number and position of sensors as well as on the
geometry of the structure that is going to be monitored), like
reported by Sbarufatti, Manes and Giglio (2011). This is
however in compliance with actual specification
requirements for damage tolerance (JSSG, 2006), at least for
the aeronautical panel structure which is going to be tested
inside this framework (Figure 1). The second point concerns
the uncertainty related to the provided measure. Obviously,
the variance of damage inference that can be obtained with a
manual scan over the entire structure is by far more precise
with respect to the PDF of the damage state estimated with a
smart sensor network, due to the complicated algorithms for
data fusion and damage characteristic evaluation.
The work reported inside this paper is about the
development and testing of a Particle Filtering algorithm for
the prognosis of aeronautical stiffened skin panels. The aim
of the work is to appreciate the advantages due to the
application of PF for the estimation of RUL, as a
comparison with a classical methodology for the estimation
of fatigue crack evolution. Moreover, this work represents
the final testing of a complete PHM system that also
comprehends an automatic DMS for the real time evaluation
of damage. A real dynamic crack propagation test has been
executed, with acquisition from a network of 20 FBG strain
sensors (Figure 1), with contemporaneous manual crack
length track. A detailed and validated Finite Element model
of the structure under monitoring has been developed and
used in a massive way inside both the DMS and the PF
algorithm. PF has been applied separately to two cases. The
former, namely off-line PHM, consists in providing as input
for the PF the crack lengths manually recorded (with an
hypothesis of the associated distribution). Concerning the
second case, namely on-line PHM, as anticipated, the output
of the real time DMS (processing the signal from the sensor
network) is given as input to the PF algorithm. The two
approaches have been compared, providing some comments
on relative performances. To be noticed that the present
article is focused on the prognostic part of the SHM, while
the interested reader could refer to the work of Sbarufatti,
Manes and Giglio (2012) for a detailed description of the
DMS design and performances (taken as input for the
current paper).
In particular, a brief overview of PF theory is provided in
section 2 of the present paper, followed by a description of
the stochastic crack propagation model and the
measurement model, respectively presented in sections 3
and 4. The PF theory has been tested for the off-line and on-
line PHM, reporting results inside section 5. A conclusive
section is also provided.
2. OVERVIEW OF PARTICLE FILTER THEORY
When modeling the behavior of dynamic systems under
degradation, at least two models are required (Cadini et al.,
2009). Firstly, a model describing the sequential evolution
of the state (or the system model) and, second, a model
relating the noisy measurements to the state (or the
measurement model). The former consists of a hidden
Markov process describing the health state ; 1:,
Figure 1. (a) Test rig for dynamic crack propagation test
starting from a notch artificially initiate on the aluminum
panel structure. (b) Typical aeronautical stiffened skin panel
structure with sensor network for diagnosis installed (20
FBG strain sensors)
First European Conference of the Prognostics and Health Management Society, 2012
105
European Conference of Prognostics and Health Management Society 2012
3
or the Transition Density Function (TDF) f that relates the
health state at time k-1 to the condition at instant k. It
consists in a Discrete time State Space (DSS) model. The
latter is the equation describing the distribution of the
observations ; 1:, or the statistical function h that
relates the condition of the monitored component to its
noised measure at time stamp k. In a Bayesian framework,
all the relevant information about the state can thus be
inferred from the posterior distribution of the state xk, given
the history of collected measurements y1…k. This is true also
concerning Particle Filters, apart from the fact that the
posterior distributions are estimated by means of MCS from
f and h. What follow are the basic steps of the mathematical
formulation of PF theory, while for a deeper description the
interested reader could refer to a tutorial on particle filter
theory (Arulampalam et al., 2002). The DSS and
measurement models will be thoroughly defined inside the
following section.
Given the stochastic damage evolution can be described
through the TDF, the aim of the PF is the selection of the
most probable damage state xk at current time k (or in
alternative the entire damage state history up to k),
according to the noisy measurements that have been
collected up to the current discrete time k. This means
estimating the posterior PDF of the health state at k, like
reported in Eq. (1), which is valid for the entire state
sequence up to k.
p :|: p :|:δ : − :d: (1)
Equation (1) indicates that the posterior PDF of the health
state can be expressed as an integral inside the space of all
possible damage evolutions : , where only those
propagations similar to the target evolution : give
contribution. According to MCS theory, the integral could
be solved by sampling : from the true posterior PDF
p :|:. Unfortunately, this is not possible, being that
distribution the objective of the inference. Thus, SIS-SIR
technique is a well-established method to overcome this
problem. The method allows generating samples from an
arbitrarily chosen distribution called Importance Density
Function (IDF) :|:, allowing to rewrite Eq. (1) in
the form of Eq. (2), without applying any bias to the
required p :|:.
p :|:
q :|: p :|:q :|: δ : − :d: (2)
An estimation of Eq. (2) can be derived through MCS
(based on q distribution), thus coming to Eq. (3), where
: , i 1,2, … , N is a set of Ns independent random
samples (particles) drawn from q :|: and δ is the so
called Dirac delta function. Finally, w
are the importance
weights calculated as the ratio between p and q
distributions, each one relative to the ith
particle (possible
propagation history) and valid for the kth
discrete instant.
p :|: 1N
! w∗ δ#: − :
$%&
'( (3)
Equation (3) expresses the required posterior PDF as a
combination of the weights associated to each particle (or to
each damage propagation sample). After some mathematical
transformations available in literature (Arulampalam et al.,
2002), one could express w∗
as a recursive formula
dependent on the weights that have been calculated at
previous discrete time k-1, as reported inside Eq. (4), where
w
are called Bayesian Importance Weights and are
calculated like in Eq. (5).
w w)(
p#y+x $p#x
+x)( $
q#x +:)(
, :$
(4)
w w
∗ p : (5)
Inside Eq. (4), p#x +x)(
$ is the TDF (f) indicating the
statistical correlation between two consecutive steps of
damage evolution. Moreover, p#y+x $ is the probability of
having a certain measure at k, given a state sample is
considered among the particles propagated up to k. This is
available once the measurement model (h) is statistically
described, like described inside section 4. Finally,
q#x +:)(
, :$ is the IDF from which one has to sample
in order to generate particles, or the random Markov Process
describing the damage evolution, which can be arbitrarily
selected.
The choice of IDF distribution is a crucial step for the PF
algorithm design. In fact, the algorithm convergence is
mathematically demonstrated to be independent from the
choice of IDF given a sufficient number of samples is
generated. If the allowed number of samples is limited, due
to computational requirements, the algorithm performances
are dependent on the choice of the importance density
function. However, as a first approximation, it is often
worth trying to select the IDF equal to the TDF (Bootstrap
approximation (Haug, 2005)). This would allow for a strong
complexity reduction of Eq. (4) as IDF and TDF will be
simplified. This means generating particles according to the
prior knowledge on material properties (however
statistically defined), then updating weights identifying the
most suitable samples according to the measure distribution
and history. Nevertheless, it could happen that the real
propagation that is measured behaves like an outlier with
respect to the stochastic damage propagation, thus forcing
First European Conference of the Prognostics and Health Management Society, 2012
106
European Conference of Prognostics and Health Management Society 2012
4
almost all the particle weights to zero. When this happens,
resampling of particles is required, from a different IDF,
somehow taking into account the history of measurement
collected up to the resampling instant.
Finally, once the health state PDF is approximated assigning
an importance weight to each particle, also the distribution
of the Failure Cycle (Nf) can be updated and refined,
conditioned on the health state, like expressed in Eq. (6),
thus allowing for the estimation of the updated RUL
distribution.
-#.+/:0$ 11
! 20∗ 34 . − .,0
356
3'( (6)
3. THE DISCRETE TIME STATE SPACE MODEL
DSS is the model describing the a priori knowledge of
probabilistic damage evolutions (particles). In other words,
it represents the possibilities for damage evolution (given
the uncertainties in material characterization as well as the
noise inevitably present inside the operating environment),
from which the algorithm selects the samples that best fit
with the measures. The model used inside the current
framework for damage propagation is based on the
NASGRO Eq. (7), though other less complicated models
such as Forman law or Paris equation (Budynas & Nisbett,
2006) have been usually adopted in literature for crack
propagation prognosis (Cadini et al., 2009). NASGRO law
allows describing not only the stable crack propagation, but
also damage initiation and the unstable crack evolution. It
also takes into account the load ratio (R) of the applied
spectrum, defined as the ratio between the valley and peak
values of the load cycle, as well as the crack closure effect
induced by plasticity near the crack tips.
787 9 ∙ ;<1 − =
1 − >? ∙ ∆ABC
∙ D1 − ∆AEF∆A GH
D1 − ∆ACIJAK GL (7)
Inside Eq. (7), 8 is the crack dimension and 78 7⁄
represents the crack growth rate per cycle (N). ΔK is the
variation of the Stress Intensity Factor (SIF) inside one load
cycle, calculated as the difference between the SIFs
evaluated in correspondence of the maximum and minimum
load. Moreover, ∆Kth is the threshold variation of SIF (crack
shouldn’t propagate below ∆Kth), Kc is the critical value of
SIF (fracture toughness) and f is the crack opening function.
Finally, C, m, p and q are parameters defined for material
characterization. The interested reader could refer to
NASGRO reference manual (2005) for a deeper insight to
the parameter definition.
Equation (7) allows calculating the crack growing rate as a
function of the applied load cycle, given the needed constant
are defined. Some comments arise relative to the work
presented hereafter. First of all, to develop a methodology as
general as possible, SIFs have not been calculated with
simple analytical formulas (usually valid for simple skins).
A large database of FEM simulated damages has been
generated, collecting SIF parameters for each case. An
Artificial Neural Network has been trained in order to fit the
function that relates the crack position and dimension to the
SIF at crack tips. The method would allow evaluating crack
propagation also for complex geometries, obviously given a
validated FEM is available (the subject of current
monitoring is an aluminum skin, stiffened through some
riveted stringers, with crack propagating on the skin).
Moreover, Eq. (7) has been stochastically described by
means of some experimental data available in literature
[Giglio & Manes, 2008]. In particular, C and m parameter
distributions have been derived from a crack propagation
test campaign made on aluminum structures. While
simulating crack propagation with Eq. (7), C and m are
randomly sampled at each step of crack evolution, thus
obtaining a model that relates the health state at discrete
instant k-1 to the condition at k, or the Transition Density
Function shown in Eq. (8). A Gaussian noise has also been
introduced, like described by Cadini at al. (2009).
p x|x)(, ∀k R 0 (8)
Thus, the probabilistic a priori information on damage
evolution is shown inside Figure 2, where the real crack
propagation (over structure presented in Figure 1) is
reported together with the random Markov Process
evolution of the simulated damage. In particular, the initial
Figure 2. NASGRO DSS model for off-line PHM.
Comparison of particles with real crack propagation
measured during experiments. Particles have been
generated starting from a 16mm measure, corresponding
to the length of the artificially initiated crack.
First European Conference of the Prognostics and Health Management Society, 2012
107
European Conference of Prognostics and Health Management Society 2012
5
crack length has been set to 16mm, corresponding to the
artificial notch introduced to fasten crack nucleation and to
control crack position. As one could notice, the random
simulated crack propagation covers a very wide range of
possibility, including also the real case measured during
test. An efficient algorithm (based on probability theory) is
thus needed in order to select which are the particles that
best fit the reality, given some measures (with noise and
uncertainty) have been taken, thus reducing the uncertainty
on the RUL estimation. The DSS model presented inside
Figure 2 will be adopted when considering the application
of PF to the off-line PHM system (measurements are
manually collected during maintenance stops). On the other
hand, Figure 3 shows the stochastic simulation of crack
propagation for the on-line case (measures of crack length
are estimated by a sensor network installed over the
structure). Simulated crack propagation has been initiated
after the anomaly detection is performed by the automatic
diagnostic system (about 60mm for the sensor network Vs.
damage configuration shown in Figure 1). The first thing to
be noticed is the reduced dispersion of particles in Figure 3
with respect to Figure 2, being the model initiated in
correspondence of a longer crack length. Moreover, the
random process of simulated crack propagation appears to
be centered on the real damage evolution in Figure 3, where
the randomness of damage evolution from 16mm to about
60mm has not been considered.
4. THE MEASUREMENT SYSTEM
Two measurement systems have been adopted, trying to
analyze the PF algorithm performances when off-line and
on-line PHMs are going to be considered (Figure 4).
Off-line PHM simulates the case when the aircraft is
stopped for maintenance and the structure is manually
scanned by operators for crack identification. In the case a
damage tolerant structure is considered, the aim is to
identify if it is possible to postpone dismounting and
repairing until the prognostic system declares a critical
condition. In order to statistically characterize the off-line
measure, it has been decided in first approximation to
consider the measurement system PDF Gaussian, with mean
value equal to the real crack length (measured with a caliber
during the real test). Nevertheless, a standard deviation (σoff)
has also been selected so that the 95% confidence band is
inside the ±3% range with respect to the measure.
On the other hand, the on-line PHM simulates the case when
the structural health condition is automatically inferred by
means of a diagnostic unit that processes data coming from
a smart sensor network. The concept consists in maintaining
the aircraft operative until the PHM system declares further
operations unsafe, given a predefined safety parameter. The
diagnostic unit used inside the current framework has been
thoroughly described by Sbarufatti et al. (2012). It basically
consists of two Artificial Neural Networks (ANN), trained
with FEM simulations in order to understand the complex
functions that relate the damage parameters (existence,
position and length) to the strain field modifications due to
damage. The first ANN (anomaly detection algorithm)
receives strain data as input and generates an alarm when
the damage index (ranging from 0 to 1) falls above 0.5. The
second algorithm (damage quantification), activated in
series to the anomaly detection, receives again strain data
and gives crack length distribution1 as output (a deeper
explanation about diagnostic unit output is again provided
by Sbarufatti et al. (2012)).
1 The quantification algorithm is composed by 50 ANNs,
trained with randomly selected damage samples (with
random position and length). Each one receives the strain
pattern from the FBG acquisition system and returns an
estimation of crack length.
Figure 3. NASGRO DSS model for on-line PHM.
Comparison of particles with real crack propagation
measured during experiments. Particles have been
generated starting from a 60mm measure, corresponding
to the length of the crack in correspondence of the
anomaly detection by the automatic diagnostic unit.
Figure 4. Comparison between (a) the Off-Line PHM
procedure and (b) the On-Line PHM process. The On-
Line process is based upon the diagnosis performed
through an on-board SHM system that detects and
characterizes structural faults.
First European Conference of the Prognostics and Health Management Society, 2012
108
European Conference of Prognostics and Health Management Society 2012
6
The PF algorithm is thus activated after the anomaly is
detected and an estimation of the damage state distribution
is provided from the diagnostic algorithm.
A comparison of the on-line vs. off-line measurement
system is provided in Figure 5. It can be noticed that the
±2σ-band adopted to simulate the behavior of a generic
system for manual surface scan is by far narrower with
respect to the uncertainty correlated to the real-time
automatic diagnostic system. For instance, considering a
70mm target crack length, the ±2σ-band ranges between
63mm and 86mm for the on-line diagnosis, while ranging
between 67.5mm and 72.5mm for the off-line measure.
However, it can be noticed that the average value of the
quantification distribution correctly estimate the target crack
length. The strong degeneracy for the σ-band of the on-line
measure of longer cracks is due to the fact that the database
of simulated experience used to train the ANN algorithms
for diagnosis has been limited up to 100mm cracks.
5. COMPARISON OF ON-LINE VERSUS OFF-LINE RESULTS
The performances of the PF algorithm when applied to the
two maintenance approaches introduced above are now
deeply investigated. The main output of the PF probabilistic
calculation is the estimation of the health condition of the
structure, like reported inside Figure 6 relatively to both off-
line and on-line PHM. In few words, the main advantage of
the PF technique is that it allows to update the posterior
PDF for the damage condition, taking into account the
history of all the measures taken up to the kth
discrete time
instant, as well as the analytical a priori knowledge given by
the underlying model for damage evolution. This becomes
particularly attractive when autonomous diagnostic systems
are considered. As a matter of fact, they could provide
continuous information relative to damage existence and
level; nevertheless they are characterized by a robustness
and precision inferior with respect to classical NDT
technologies (herein simulated with off-line measures). In
practice, PF could filter the most suitable states at kth
instant, inside the database of possible damage evolutions
(particles) calculated a priori with respect to any measure.
Particles relative to the off-line and on-line PHM have been
shown in Figure 2 and Figure 3 respectively. Once the
actual state distribution is updated and refined, the
distribution of the RUL could also be updated, becoming
conditional on the whole history of the monitored
component, and consistent with the analytical and empirical
knowledge which is inside the TDF.
The state posterior PDF estimation is shown inside Figure 6,
relatively to the off-line (Figure 6(a)) and on-line (Figure
6(b)) cases. PF has been applied to a real crack propagation
test, with contemporaneous manual acquisition of crack
Figure 5. Measurement system uncertainties. Comparison
of the on-line diagnostic system performance with respect
to the off-line manual structural scan methodology. The
on-line diagnostic system has been trained with FEM
damage simulations, with crack length up to 100mm.
Figure 6. Filtering of the health state distribution. (a) Posterior PDF of the health state for the off-line measure and (b)
Posterior PDF of the health state for the on-line structural diagnosis. The real crack propagation is shown, as well as the
collected measures. The posterior 95% σ-band is also plotted, to be compared with the a priori σ-band reported inside
Figure 5. The instants when the algorithm required particle resampling have also been indicated.
First European Conference of the Prognostics and Health Management Society, 2012
109
European Conference of Prognostics and Health Management Society 2012
7
length measures (processed in Figure 6(a)) and automatic
estimation of crack measure by means of an on-board smart
sensor network based upon strain field (processed in Figure
6(b)). It is immediately clear that, while the manual
structure scan would allow to detect and to measure shorter
cracks (the inferior limit is imposed herein by the length of
the artificial damage for crack initialization, set to 16mm),
the anomaly detection threshold for the sensor network and
damage configuration reported in Figure 1 is around 60mm.
On the other hand, off-line measures are available at
predefined scheduled intervals, while the on-line health
assessment is retrieved in continuous every 1000 load cycles
through the diagnostic unit developed by Sbarufatti et al.
(2012). However, on-line measures are affected by a large
uncertainty if compared to the off-line case, like described
into Figure 5.
Concerning the off-line PHM system, the health state
estimation (Figure 6(a)) appears to characterize precisely the
damage evolution, being the 95% σ-band mostly centered
on the real damage condition. However, it is clear from
Figure 2 that the damage evolution occurred during the test
is not centered with respect to the stochastic model used to
define the TDF. This resulted in resampling requirement
after few updating iterations, as the available particles were
not enough to describe the posterior PDF of the health state
(only few particles retains a weight which is significantly
different from zero).
Relating to the on-line PHM system, it can be noticed that
the posterior PDF of the health condition is by far narrower
with respect to the output of the diagnostic algorithm shown
inside Figure 5. For instance, relatively to a 70mm crack,
the 95% σ-band of the quantification algorithm (Figure 5)
ranges from 63mm to 86mm, while after the PF updating
process it ranges from 68mm to 72.5mm (Figure 6(b)).
However, the estimated σ-band sometimes doesn’t
comprehend the real state evolution. This is mainly due to
the fact that the measures are affected by a higher error
(with respect to the off-line system), which is in part
confirmed by the evolution of some stochastic particles.
This means that, if a lot of measures over/underestimate the
real damage condition and their assumptions are also
confirmed by the DSS model, the PF precision will
decrease. However, under the reasonable assumption
(Figure 5) that the measure PDF is centered on the target,
the PF inference will converge toward the real damage
evolution. In other words, PF tends to interpolate the
measures, nevertheless taking into account the a priori
knowledge which is inside the DSS model. Though the DSS
model used for the a priori description of the damage
evolution for the on-line PHM results centered on the real
crack propagation (Figure 3), particle resampling was also
required, due to the fact that the updating process focused
on a particular set of particles.
Some specifications are required concerning the adopted
resampling technique. As a matter of fact, the DSS model
used to initialize the algorithm has been kept as general as
possible (considering the distribution of material parameters
inside the NASGRO law), in order to be representative of
many experimental tests for crack propagation on the same
material (aluminium). The resulting DSS spreading is high,
thus provoking premature particle degeneracy and
requirement for resampling. Nevertheless, if a sufficient
number of iterations have been concluded, it is possible to
generate new particle samples from a different importance
density q#x +:)(
, :$ , taking now the history of
measures into account but preventing from the possibility to
adopt Bootstrap approximation. Concerning the work herein
reported, new particles are generated considering a TDF
with deterministic material parameters (C and m are now
obtained by fitting the specific measures taken relatively to
the specimen under monitoring) and random white noise.
From one hand, this would allow to reduce the uncertainty
Figure 7. Effect of NASGRO parameter dynamic fitting. A sudden (unpredicted) change in the slope of the crack
propagation curve cannot be described before it has happened.
First European Conference of the Prognostics and Health Management Society, 2012
110
European Conference of Prognostics and Health Management Society 2012
8
related to prognosis. From the other hand, like described
into Figure 7 (where the noise has been eliminated for just
description purposes), this method is less robust to
unexpected changes in the system dynamics. It is clear from
Figure 7 that, if C and m are considered to be deterministic,
they cannot take into account for sudden changes in the
curve slope (Figure 7(a)), unless a new resampling is
executed fitting the propagation curve with new measures
(Figure 7(b)). The effect is visible in the RUL estimation,
relative to the off-line PHM case (Figure 9(a)); the error in
RUL estimation with PF increases after resampling is
executed at 250000 load cycles, until a new resampling is
executed at about 300000 load cycles, taking into account
the unexpected change in the crack evolution slope.
Once the PDF of the health state is filtered by the PF
algorithm, also the RUL of the component under monitoring
can be updated according to Eq. (6). In order to appreciate
the advantages and drawbacks of the PF algorithm, it has
been compared with a second technique. The method
consists in evaluating the RUL PDF by performing a
stochastic crack propagation based on the NASGRO law. In
few words, given the PDFs of the material related constants
are provided, 3000 crack propagations (particles) have been
simulated, sampling at each step the material constants from
the available distributions. Once the target crack length is
identified (120mm have been selected as limit crack length,
due to the limits of the FEM database), the RUL can be
stochastically defined with a PDF. The same procedure is
repeated each time a new estimation of the crack length is
provided either from the on-line or the off-line diagnostic
system. To be noticed that this method just depends on the
last measure provided by diagnostic and doesn’t take into
account the trend of historical measures (which is, on the
contrary, the advantage of PF). Each inference is thus
completely uncorrelated to the previous ones. Moreover, it
requires simulating many crack propagation every time a
new RUL PDF is needed. Stochastic NASGRO (SN) and
Particle Filter RUL evaluations are respectively reported in
Figure 8 and 9, again relatively to the on-line and off-line
Figure 8. 95% σ-band for RUL estimation with stochastic NASGRO law. Comparison of off-line PHM (a) versus on-line
PHM (b).
Figure 9. 95% σ-band for RUL estimation with Particle Filtering algorithm. Comparison of off-line PHM (a) versus on-line
PHM (b).
First European Conference of the Prognostics and Health Management Society, 2012
111
European Conference of Prognostics and Health Management Society 2012
9
PHM. The estimated RUL (intended hereafter as the
remaining number of cycles before reaching the 120mm
long crack) is reported during the component life (as a
function of load cycles). The real RUL is shown as well as
its estimation calculated with SN law (Figure 8) and PF
(Figure 9). In particular, the expected value of the RUL PDF
has been reported, as well as the 95% σ-band. The first thing
to be pointed out is that SN only depends on the knowledge
of material properties (and applied load); for this reason, if a
discrepancy between the DSS and reality is present at the
beginning, there won’t be an updating process on the basis
of the collected measures, thus maintaining the same error
during life, as clearly appreciable from Figure 8(b).
Moreover, the SN prognosis is very sensitive to the quality
of the measure, being an issue especially when the on-line
PHM is considered, where the inevitable fluctuations in the
inference on structural condition (due to the high level of
uncertainties) will be reflected in an unstable prognosis
(Figure 8(b)). On the other hand, PF technique is able to
filter these uncertainties (Figure 9(b)), thus estimating a
RUL which is dependent on the entire trend of measures
that have been collected since the anomaly is identified. The
variance of the RUL PDF evaluated with the two prognosis
methods appears to be of the same order, unless resampling
is performed in PF algorithm. As explained above, the
information retrieved from the collected measures would
allow decreasing significantly the uncertainty in prognosis
(as at least the uncertainty related to material properties can
be by far reduced). This is well reflected in Figure 9(b)
where an important reduction in the variance of PF
estimation of RUL is obtained. After 275000 load cycles,
only few particles remained with a non-negligible weight,
thus provoking degeneracy of the algorithm. New particles
have thus been generated, nevertheless without considering
the material uncertainty inside the DSS (C and m parameters
inside the NASGRO equation are deterministic and obtained
through a non-linear fitting of the historical data available
up to resampling instant). Nevertheless, the resampling
technique has to be improved in order to avoid focusing in a
too narrow region inside the DSS. In fact this is the reason
for the deviation of the estimated RUL PDF from real RUL
inside Figure 9(a), like described in Figure 7.
Finally, two comments arise while comparing off-line
versus on-line PHM. Firstly, the 95% σ-band of the RUL
based on the off-line measure is narrower due to the more
precise measuring system. Nevertheless, the disposal of a
real-time diagnostic tool would increase the availability of
data relative to the health state, thus reducing the time
needed to the PF algorithm to converge on the correct
estimation.
6. CONCLUSIONS
A Particle Filtering (PF) Bayesian updating technique has
been used inside this framework for the dynamic estimation
of component Residual Useful Life. Two applications have
been compared. The first one consists in applying particle
filters to a Condition Based Maintenance where the
structural health monitoring (SHM) has been off-line
performed by maintenance operators. The second one
consists in an automatic SHM performed on-board by a
diagnostic unit trained with Finite Element damage
simulation to recognize crack damage existence and length,
based upon strain field measure. The methodology has been
tested in laboratory on a specimen representative of a typical
aeronautical structure, constituted by a skin, stiffened
through some riveted stringers. Though the uncertainty
related to the on-line structural diagnosis is by far larger
than the one associated to the off-line measure, PF
algorithm proved to correctly describe the posterior RUL
distribution (conditional on the measures) in both cases. The
additional uncertainty in the on-line measures resulted to be
compensated by the availability of a continuous measure,
thus allowing the algorithm to reach convergence in a
relatively inferior time. PF algorithm has also been
compared to a simpler technique based upon stochastic
NASGRO (SN) law propagation. The advantage of PF with
respect to SN is that it takes into account the whole history
of measures taken on the monitored component as well as
the prior knowledge coming from the propagation model.
This results in a more robust and precise estimation of the
health state as well as of the RUL PDF. Finally, the
adoption of a robust filtering methodology that merges the
information coming from a wide sensor network with the
numerical or analytical knowledge about the phenomenon
subject of monitoring appears to be a suitable technique for
the performance increase of automatic SHM systems, thus
leading toward the real on-board PHM.
NOMENCLATURE
ANN Artificial Neural Network
DMS Diagnostic Monitoring System
DSS Discrete State-Space
FBG Fiber Bragg Grating
FEM Finite Element Model
IDF Importance Density Function
MCS Monte-Carlo Sampling
NDT Non Destructive Technology
PDF Probability Density Function
PF Particle Filter
PHM Prognostic Health Management
RUL Residual Useful Life
SHM Structural Health Monitoring
SIF Stress Intensity Factor
SIR Sequential Importance Resampling
SIS Sequential Importance Sampling
SN Stochastic NASGRO
TDF Transition Density Function
First European Conference of the Prognostics and Health Management Society, 2012
112
European Conference of Prognostics and Health Management Society 2012
10
REFERENCES
Arulampalam, S., Maskell, S., Gordon, N. & Clapp, T.
(2002), A tutorial on particle filters for online
nonlinear/non-Gaussian Bayesian tracking, IEEE
Transactions on Signal Processing, 50(2): 174-188.
Boller, C. (2001), Ways and options for aircraft structural
health management, Smart Materials & Structures, 10:
432-440.
Budynas & Nisbett (2006), Shigley’s Mechanical
Engineering Design, VIII edition, McGraw-Hill
Cadini, F., Zio, E. & Avram, D. (2009), Monte Carlo-based
filtering for fatigue crack growth estimation,
Probabilistic Engineering Mechanics, 24: 367-373.
Giglio, M. & Manes, A. (2008), Crack propagation on
helicopter panel: experimental test and analysis,
Engineering fracture mechanics, 75:866-879.
Haug A.J. (2005), A tutorial on Bayesian Estimation and
Tracking Techniques Applicable to Nonlinear and Non-
Gaussian Processes, MITRE technical report, Virginia.
JSSG-2006, Joint Service Specification Guide – Aircraft
Structures, Department of USA defence.
Lazzeri, L. & Mariani, U. (2009), Application of Damage
Tolerance principles to the design of helicopters,
International Journal of Fatigue, 31(6): 1039-1045.
NASGRO reference manual (2005), Fracture Mechanics and
Fatigue Crack Growth Analysis Software, version 4.2.
Sbarufatti, C., Manes, A. and Giglio, M. (2010), Probability
of detection and false alarms for metallic aerospace
panel health monitoring, Proc. 7th Int. Conf. on CM &
MFPT, Stratford Upon Avon, England.
Sbarufatti, C., Manes, A. & Giglio, M. (2011), HECTOR:
one more step toward the embedded Structural Health
Monitoring system, Proc. CEAS 2011, Venice, Italy.
Sbarufatti, C., Manes, A. & Giglio, M. (2011), Advanced
Stochastic FEM-Based Artificial Neural Network for
Crack Damage Detection, Proc. Coupled 2011, Kos,
Greece.
Sbarufatti, C., Manes, A. & Giglio, M. (2011), Sensor
network optimization for damage detection on
aluminum stiffened helicopter panels, Proc. Coupled
2011, Kos, Greece.
Sbarufatti, C., Manes, A. & Giglio, M. (2012), Diagnostic
System for Damage Monitoring of Helicopter Fuselage,
Proc. EWSHM 2012, Dresden, Germany.
Schmidt, H.J. & Schmidt-Brandecker, B. (2009), Design
Benefits in Aeronautics Resulting from SHM,
Encyclopedia of Structural Health Monitoring.
BIOGRAPHIES
Claudio Sbarufatti was born in Milan, Italy, on May 15,
1984. He received his Master of Science Degree in
Mechanical Engineering in 2009 at Politecnico di Milano,
Italy. He developed his MD thesis about rotor dynamics and
vibration control at Rolls Royce plc. (Derby, UK). At now,
he works in the Mechanical Department of Politecnico di
Milano, where he is going to conclude his Ph.D. in 2012.
The title of his Ph.D. thesis is “Fatigue crack propagation on
helicopter fuselages and life evaluation through sensor
network”. His research fields are the development of
structural health monitoring systems for diagnosis and
prognosis, Finite Element modeling, design and analysis of
helicopter components subject to fatigue damage
propagation, artificial intelligence applied to structural
diagnosis, Bayesian statistics, Monte-Carlo methods, sensor
network system design.
Matteo Corbetta was born in Cantù, Italy, on April 11,
1986. He received the Bachelor of Science degree in
Mechanical Engineering from Politecnico di Milano in
2009. He is going to receive the Master of Science in
Mechanical Engineering in 2012 at Politecnico di Milano.
At now he works in Mechanical Department of Politecnico
di Milano in the field of Structural Health Monitoring. His
current research interests are fracture mechanics and
probabilistic approaches for prognostic algorithms.
Ph.D. Andrea Manes was born in La Spezia, Italy, on
August 11, 1976. He is an Assistant Professor of
Mechanical Design and Strength of Materials, and works in
the Department of Mechanical Engineering at Politecnico di
Milano, Italy. His research fields are mainly focused on
structural reliability of aerospace components using a
complete research strategy based on experimental tests,
numerical models and material characterization. Inside this
framework several topics have been investigated: novel
methods for SHM application, methods of fatigue strength
assessment in mechanical components subjects to multiaxial
state of stress, design and analysis of helicopter components
with defects, ballistic damage and evaluation of the residual
strength, assessment of sandwich structures subjected to low
velocity impacts. He is the author of over 70 scientific
papers in international journals and conferences and is a
member of scientific associations (AIAS, Italian Association
for the Stress Analysis, IGF, Italian Group Fracture, CSMT,
Italian safety commission for mountaineering).
Marco Giglio was born in Milan, Italy, on November 1,
1961. He is an Associate Professor of Mechanical Design
and Strength of Materials, and works in the Department of
Mechanical Engineering at Politecnico di Milano, Italy. His
research fields are novel methods for SHM application,
methods of fatigue strength assessment in mechanical
components subjects to multiaxial state of stress, design and
analysis of helicopter components with defects, ballistic
damage and evaluation of the residual strength. He is the
author of over 100 scientific papers in international journals
and conferences and is a member of scientific associations
(AIAS, Italian Association for the Stress Analysis, IGF,
Italian Group Fracture).
First European Conference of the Prognostics and Health Management Society, 2012
113
Health Assessment and Prognostics of Automotive Clutches
Agusmian Partogi Ompusunggu1,2, Steve Vandenplas1, Paul Sas2, and Hendrik Van Brussel2
1 Flanders’ MECHATRONICS Technology Centre (FMTC), 3001 Heverlee, [email protected]
2 Katholieke Universiteit Leuven, Department of MechanicalEngineering, Division PMA, 3001 Heverlee, [email protected]
ABSTRACT
Despite critical components, very little attention has beenpaid for wet friction clutches in the monitoring and prog-nostics research field. This paper presents and discusses anoverall methodology for assessing the health (performance)and predicting the remaining useful life (RUL) of wet frictionclutches. Three principle features extracted from relative ve-locity signal measured between the input and output shaft ofthe clutch, namely (i) the normalized engagement duration,(ii) the normalized Euclidean distance and (iii) the SpectralAngle Mapper (SAM) distance are fused with a logistic re-gression technique into a single value called the health index.In logistic regression analysis, the output of the logisticmodel(i.e. the health index) is restricted between0 and1. Accord-ingly, the logistic model can guide the users to assess the stateof a wet friction clutch either in healthy state (e.g. health in-dex value of (close to) 1) or in failed state (e.g. health indexvalue of (close to) 0). In terms of prognostics, the logarithmof the odds-of-successg defined asg = log[h/(1−h)], whereh denotes the health index, is used as the predicted variable.Once a history data is sufficient for prediction, the weightedmean slope (WMS) method is implemented in this study toadaptively build a prognostics model and to predict the tra-jectory of g until it crosses a predetermined threshold. Thisway, the remaining useful life (RUL) of a clutch can be de-termined. Furthermore, an experimental verification of theproposed methodology has been performed on two historydatasets obtained by performing accelerated life tests (ALTs)on two clutch packs with different friction materials but thesame lubricant. The experimental results confirm that the pro-posed methodology is promising and has a potential to be im-plemented for real-life applications. As was expected, thees-timated RUL converges to the actual RUL and the uncertainty
Agusmian Partogi Ompusunggu et.al. This is an open-access article dis-tributed under the terms of the Creative Commons Attribution 3.0 UnitedStates License, which permits unrestricted use, distribution, and reproduc-tion in any medium, provided the original author and source arecredited.
interval decreases over time that may indicate that the prog-nostics model improves as more evidence becomes available.
1. INTRODUCTION
Wet friction clutches are mechanical components enablingthe power transmission from an input shaft (connected toengine) to an output shaft (connected to wheels), based onthe friction occurring in lubricated contacting surfaces.Theclutch is lubricated by an automatic transmission fluid (ATF)having a function as a cooling lubricant that cleans the con-tacting surfaces and gives smoother performance and longerlife. However, the presence of the ATF in the clutch reducesthe coefficient of friction (COF). In applications where highpower is necessary, the clutch is therefore designed with mul-tiple friction and separator discs. This configuration is knownas a multi-disc wet friction clutch as can be seen in Figure 1,in which the friction discs are typically mounted to the hubby splines, and the separator discs are mounted to the drumby lugs.
Friction discs
Separator discs
Drum
Hub
Figure 1. Exploded view of a wet friction clutch.
Today’s vehicles have become widely equipped with auto-matic transmission (AT) systems, where wet friction clutches
1
First European Conference of the Prognostics and Health Management Society, 2012
114
European Conference of the Prognostics and Health Management Society, 2012
are one of critical components that play a major role on thetransmission performance. In the beginning of its life, aclutch is designed to transmit certain power under a smoothand fast engagement with minimal shudder. But, due to theunavoidable degradation, the clutch frictional characteristicschange, thus altering its initial performance and consequentlyaffecting the performance of the vehicles. As the degrada-tion proceeds, failure can unexpectedly occur, which even-tually leads to the total breakdown of the vehicles. An un-expected breakdown can put human safety at risk, possiblycause long term vehicle down times, and result in high main-tenance costs. Hence, integration of a maintenance strategyinto AT systems can significantly increase safety and avail-ability/reliability as well as reduce the maintenance costofthe vehicles.
The maintenance strategy should be performed in an opti-mal way, in the sense that degrading clutches need to be re-placed with the new ones at theright time. Here, the righttime can be referred to as the “optimal” end of life of theclutch, at which the clutch is no longer functioning as itshould be. Notice that the end of the clutch lifetime doesnot necessarily mean the condition where the catastrophicfailure occurs. Regarding the optimal maintenance strategy,the information concerning the end of clutch lifetime (or re-maining useful clutch life) therefore becomes important as-pect in order to minimize the vehicles downtime. ConditionBased Maintenance (CBM), which is also known as Predic-tive Maintenance (PdM), is a right-on-time maintenance strat-egy which is driven by the actual condition of the criticalcomponent of systems. This concept requires technologiesand experts, in which all relevant information, such as per-formance data, maintenance histories, operator logs and de-sign data are combined to make optimal maintenance deci-sions (Mobley, 2002). In general, the key technologies for re-alizing the PdM strategy rely on three basic elements, namely(i) condition monitoring, (ii) diagnosticsand (iii) prognostics.PdM has been in use since 1980’s and successfully imple-mented in various applications, such as in oil platforms, man-ufacturing machines, wind turbines, automobiles, electronicsystems, (Basseville et al., 1993; Bansal, Evans, & Jones,2004; Garcia, Sanz-Bobi, & Pico, 2006; Srinivas, Murthy, &Yang, 2007; Bey-Temsamani, Engels, Motten, Vandenplas, &Ompusunggu, 2009b, 2009a).
Despite critical components, to authors’ knowledge, very lit-tle attention has been paid to wet friction clutches in thearea of PdM research. Several methods have been pro-posed in literature for assessing the condition of wet fric-tion clutches based on the quality of the friction mate-rial, namely (i) Scanning Electron Microscope (SEM) mi-crograph, (ii) surface topography, (iii) Pressure DifferentialScanning Calorimetry (PDSC) and (iv) Attenuated Total Re-flectance Infrared spectroscopy (ATR-IR) (Jullien, Meurisse,& Berthier, 1996; Guan, Willermet, Carter, & Melotik., 1998;
Li et al., 2003; Maeda & Murakami, 2003; Nyman, Maki,Olsson, & Ganemi, 2006). Generally, the implementation ofthese existing methods is very time consuming and possiblynot pragmatic for real-life applications, owing to the factthatthe friction discs have to be taken out from the clutch packand then prepared for assessing the degradation level. In otherwords, an online monitoring and prognostics system can notbe realized by using these existing methods
As the central role of wet friction clutches relies on the fric-tion, a natural way to monitor and assess the condition ofthese components is by monitoring and quantifying the fric-tional characteristics. The use of the mean (averaged) coef-ficient of friction (COF) for a given duty cycle as a principlefeature for condition monitoring of wet friction clutches hasbeen popular for many years (Matsuo & Saeki, 1997; Ost,Baets, & Degrieck, 2001; Maeda & Murakami, 2003; Li etal., 2003; Fei, Li, Qi, Fu, & Li, 2008). However, this is nor-mally performed in laboratory tests, namely durability testsof clutch friction materials and ATF, where the used test setup(i.e. SAE#2 test setup) is fully instrumented. For real-life ap-plications, the use of the mean COF for clutch monitoring ispossibly expensive and not easily implementable, due to thefact thatat least two sensorsare required to extract it, namelya torque and a force sensor, which are commonly difficult toinstall (typically not available) in today’s transmissions.
Regarding clutch health assessment and prognostics, only afew publications were found in literature. Yanget al. (Yang,Twaddell, Chen, & Lam, 1997; Yang & Lam, 1998) devel-oped a physics-based prognostics model by considering thatthe degradation occurring in a clutch is only due to thermal ef-fect taking place in the friction materials. The model was de-veloped based on the cellulose fibers concentration, where thechange of the fibers concentration is assumed to be likenedto a simple chemical reaction. They found that, the ratiobetween the instantaneous concentration of cellulose fibersW and the initial concentration of cellulose fibersW0, i.e.weight loss ratioW
W0, likely follows a zero-th order reaction
in isothermal condition. The degradation rate constants areobtained by performing dedicated (separate) Thermal Gravi-metric Analysis (TGA) experiments on the friction materialsamples taken out from clutch packs at different interfacetemperatures. To predict the degradation level and RUL offriction material under dynamic engagement, the temperaturehistory of friction material as a function of time and axial lo-cations are of importance in this approach. Since the interfacetemperature of the friction material during a clutch engage-ment is difficult to measure, they (Yang, Lam, Chen, & Yabe,1995; Yang & Lam, 1998) developed a comprehensive anddetailed mathematical model to predict the temperature at theinterface, as well as the temperature distribution as a func-tion of time and different locations. Hence, the accuracy ofthe existing clutch prognostics method is strongly determinedby the accuracy of temperature prediction. Since the degra-
2
First European Conference of the Prognostics and Health Management Society, 2012
115
European Conference of the Prognostics and Health Management Society, 2012
dation mechanism occurring in the clutch friction materialisnot only due to thermal effect but also another major mech-anism namely adhesive wear (Gao, Barber, & Chu, 2002;Gao & Barber, 2002), the assumption made in this prognos-tics method is too oversimplified. When the complete designdata of a wet friction clutch is not available, this prognosticsmethod would be possibly difficult to implement.
Considering the above discussed literature survey, one mayconclude that the available clutch monitoring and prognosticsmethods are not pragmatic and flexible to implement for real-life applications. Nowadays, there is still a need for automo-tive industries (e.g.our industrial partner) to realize a clutchmonitoring and prognostics system which is easy to imple-ment and flexible to adapt. In addition to this, the develop-ment of such a system must be based on typically availablesensors in AT systems, such as rotational velocity, pressureand temperature sensors. Hence, research in this directionisstill of great interest.
Recently, some potential monitoring methods which canserve as bases for clutch prognostics have been investigatedand reported in previous publications. The preliminary eval-uations of a clutch monitoring method based on thepost-lockup torsional vibration are discussed in (Ompusunggu,Papy, Vandenplas, Sas, & VanBrussel, 2009; Ompusunggu,Sas, VanBrussel, Al-Bender, Papy, & Vandenplas, 2010).Since it is reasonable to assume that the ATF has no signifi-cant effect on the clutch post-lockup torsional vibration,thismethod is suitable to monitor only the clutch friction mate-rial degradation. A more complete description concerningthe feasibility and practical implementation of this methodwill be discussed in another communication. Furthermore, aclutch monitoring method based on thepre-lockuptorsionalvibration is evaluated in (Ompusunggu, Sas, VanBrussel, Al-Bender, & Vandenplas, 2010), where a high resolution ro-tational velocity sensor is required in order to capture thehigh frequency torsional vibration. Another clutch monitor-ing method based on tracking the change of the relative rota-tional velocity between the input and output shaft of a clutchis proposed in (Ompusunggu, Papy, Vandenplas, Sas, & Van-Brussel, 2012). Since the relative velocity can be seen as therepresentative of the clutch dynamic behavior during the en-gagement phase, which are strongly influenced by the com-bination of friction material and ATF, the latter method canbe used to monitor the global state of a clutch. Nevertheless,the prognostics aspect was not tackled yet in those publica-tions. An attempt to develop a systematic methodology forhealth assessment and prognostics of wet friction clutchesisthe main objective of this paper. For this purpose, the lat-ter condition monitoring method described in (Ompusungguet al., 2012) is extended in this paper towards a prognosticsmethodology.
The remainder of this paper is organized as follows. After in-
troducing the objective and motivation, the methodology pro-posed in this paper is presented and discussed in Section 2.To verify the proposed method, life data of some commer-cially available clutches obtained from accelerated life tests(ALTs) carried out on a fully instrumented SAE#2 test setupare employed. The details of the experimental aspects aredescribed in Section 3. The results obtained after applyingthe proposed method to the clutches’ life data are further pre-sented and discussed in Section 4. Finally, some conclusionsdrawn from the study that can be a basis for future work arepresented in Section 5.
2. METHODOLOGY
The overall methodology proposed in this paper is describedin the flowchart depicted in Figure 2. As can be seen in thefigure, the methodology consists of four steps. In the firststep, capturing the signal of interest from raw pressure andrelative velocity signals is discussed. In the second step,threeprimary features are computed once the signal of interest hasbeen captured, where the verification of the three features hasbeen addressed in another publication (Ompusunggu et al.,2012). In the third step, the features are fused into a singlevalue, namely health index, using a logistic regression tech-nique. The output of the logistic model is restricted between0 and 1 such that the health or performance of a wet frictionclutch can be easily assessed. Finally, the algorithm to pre-dict the remaining useful life (RUL) using the fused featuresas the predicted variable is presented and discussed. Sincethe knowledge of the evolution of the proposed features dur-ing clutches’ lifetime is still limited, a data-driven prognosticsapproach is investigated in this paper.
Preprocessing Input Signals: Captur-ing relative velocity signal of interest.
Feature Extraction: Computing prin-ciple features, namely normalized engage-ment duration τe, normalized Euclideandistance DE and SAM distance DSAM .
Performance Assessment: Fusing fea-tures (τe, DE and DSAM ) into a HealthIndex (HI) constrained between 0 and 1.
Prognostics: Predicting theremaining useful life (RUL).
Figure 2. Flow chart of the proposed methodology.
3
First European Conference of the Prognostics and Health Management Society, 2012
116
European Conference of the Prognostics and Health Management Society, 2012
2.1. Relevant Signals Measurement
Prior to computing principle features as will be discussed inthe next subsection, the raw signals obtained from measure-ments first need to be preprocessed. Figure 3 graphically il-lustrates the signal preprocessing step, namely the procedureto capture therelative velocity signal of interestbased ontwo raw measurement signals: (i) therelative rotational ve-locity and (ii)applied pressuresignals. In the following para-graphs, the procedure is briefly discussed.
Let the signal of interest be captured at a given (arbitrary)duty cycle with a predetermined time record lengthτ , andsuppose that the time record length is kept the same for allduty cycles. For the sake of consistency, the signal must becaptured at thesamereference time instant. It is reasonable toconsider that the time instant when the ATF pressure appliedto the clutch packp(t) starts to increase from zero valuetf asthe reference time, which can be mathematically formulatedas:
tf = min ∀t ∈ R : p(t) > 0 . (1)
While the applied pressure is increasing, contact is graduallyestablished between the separator and friction discs. As aresult, the transmitted torque increases that consequently re-duces the relative velocitynrel(t). Eventually, the clutch isfully engaged when the relative velocity reaches zero for thefirst time at the lockup time instanttl that can be formulatedin similar way to Equation (1) as:
tl = min ∀t ∈ R : nrel(t) = 0 . (2)
t [a.u.]
p[a
.u.]
nrel
[a.u
.]
pmax
nmax
0
0tf tl
τ
τe
Figure 3. A graphical illustration of how to capture the rela-tive velocity signal of interest. The upper and lower figuresrespectively denote the typical applied pressurep and the rawrelative velocity signalnrel. Note that a.u. is the abbreviationof arbitrary unit.
2.2. Feature Extraction
Formal definitions of the developed features (engagement du-ration, Euclidean distance and Spectral Angle Mapper dis-tance) and the mathematical expressions to compute them arediscussed in this subsection. The first two features are dimen-sional quantities while the third one is dimensionless. Thefirst two features are normalized such that they become di-mensionless quantities and are in the same order of magni-tude as of the third feature.
2.2.1. Engagement Duration
The engagement durationτe is referred to as the time inter-val between the lockup time instanttl and the reference timeinstanttf , as graphically illustrated in Figure 3. Once bothtime instantstf andtl have been determined, the engagementdurationτe can then be simply computed as follows:
τe = tl − tf . (3)
Without loss of generality,τe can be normalized with respectto the engagement duration measured at the initial condition(healthy state)τ re , according to the following equation:
τe =τe − τ re
τ re, (4)
whereτe denotes the dimensionless engagement duration.
2.2.2. Dissimilarity Measures
A dissimilarity measure is a metric that quantifies the dissim-ilarity between objects. For the sake of condition monitoring,the dissimilarity measure between an object that representsan arbitrary condition and the reference object that representsa healthy condition, can be treated as a feature. Thus, thedissimilarity measure between two identical objects is (closeto) zero; the dissimilarity measure between two non-identicalobjects on the other hand is not zero. Here, the object will bereferred to as the relative velocity signalnrel. Two dissimilar-ity measures, namely the Euclidean distance and the SpectralAngle Mapper (SAM) distance, are considered in this paperbecause of their computational simplicity (Kruse et al., 1993;Paclik & Duin, 2003).
The main motivation behind the dissimilarity approach is thatthe measured signals of interest are treated as vectors. LetX be aK dimensional vector,xi, i = 1, 2, . . . ,K, denotingthe discrete signal of the relative velocity measured in a nor-mal (healthy) condition andY be aK dimensional vector,yi, i = 1, 2, . . . ,K, denoting the discrete signal of the rela-tive velocity measured in an arbitrary condition. The vectorX representing a healthy condition is referred to as the “base-line”.
The dimensional Euclidean distanceDE between the vectors
4
First European Conference of the Prognostics and Health Management Society, 2012
117
European Conference of the Prognostics and Health Management Society, 2012
X andY is defined as:
DE (X,Y) =
√√√√K∑
i=1
(xi − yi)2. (5)
For convenience,DE can also be normalized in accordancewith the following equation:
DE (X,Y) =DE (X,Y)
x1
√K
, (6)
whereDE denotes the dimensionless Euclidean distance andx1 > 0 denotes the initial value of thebaseline.
By definition, the SAM distance is a measure of the angle be-tween two vectors and is therefore dimensionless. The SAMdistanceDSAM between the vectorsX andY is mathemati-cally expressed as:
DSAM (X,Y) = cos−1
∑Ki=1 xiyi√∑K
i=1 x2i
√∑Ki=1 y
2i
. (7)
Recall that the distance from an object to itself is zero andthat a distance is always non-negative.
2.3. Health Assessment
Health assessment constitutes a dichotomous problem,namely determining the state of of a unit (system) of interest(UOI) whether in healthy or failure state. Intuitively, healthcan be represented by a binary value,e.g. 0 or 1, where thiscategorical value may be seen as a health index. For healthassessment purposes, it is natural to assume that the healthin-dex of (close to) 1 represents a healthy state, while the healthindex of (close to) 0 represents a failure state. This formula-tion implies that the degradation occurring in a UOI is indi-cated by the progressive change of the health index from 1 to0. It should be noted that the health index is sometimes called“confidence value” in literature.
In practice, feature values are not necessarily restrictedbe-tween 0 and 1, which cannot allow a direct justification onthe health of a UOI. Despite reflecting the actual conditionof a UOI, principle features extracted from measurement datacannot be directly used to assess the health of the UOI unlesstherelative distancesto the corresponding values which rep-resent the end of life of the UOI (i.e. thresholds) are known.To this end, the feature values evolving from a healthy to fail-ure state need to be transformed to the health indices.
In this study, health assessment based on a logistic regres-sion technique is investigated. As will be shown later, logis-tic regression can be seen as a process with a two-fold objec-tive: (i) fusing multiple features (independent variables) intoa single value (i.e. health index) and (ii) restricting the healthindex between 0 and 1. As discussed in (Lemeshow & Hos-mer, 2000), logistic regression is appropriate technique for di-chotomous problems, where the predicted variable (i.e. health
index) must be greater than or equal to zero and less than orequal to one. Unlike linear regression which is inappropri-ate for dichotomous problems (Lemeshow & Hosmer, 2000),in logistic regression, only data representing healthy andfail-ure states are required to estimate the regression coefficients.Thus, a logistic regression technique is suitable to problemswith limited number of history data. Moreover, it has beenreported in the literature that logistic regression technique isa powerful tool for health assessment modeling of some sys-tems based on extracted high dimensional features (Yan, Koc,& Lee, 2004; Yan & Lee, 2005).
Let us consider a simplelogistic function P (F ) defined as:
P (F ) = h =1
1 + e−g(F)=
eg(F )
1 + eg(F ), (8)
whereF = F1, F2, . . . , FL denotes a set ofL extractedfeatures,h denotes the health index of an event (i.e. healthyor failure) given a set of featuresF andg(F ) is thelogit func-tion which is mathematically expressed as:
g(F ) = g = log
(P (F )
1 − P (F )
)=
L∑
i=0
βiFi, (9)
whereF0 = 1, βi denotes the logistic model parameters tobe identified andg denotes the logarithm of the “odds-of-success”. In a more compact way, Equation (9) can be rewrit-ten as:
g = βTF, (10)
withβ = [β0 β1 β2 . . . βL]T
andF = [1 F1 F2 . . . FL]T,
where the superscript T denotes a transpose operation.
Note that the logistic function expressed in Equation (8) canbe seen as a kind of probability function (cumulative distri-bution function) because it ranges between 1 (healthy) and 0(failure). In addition to this, the logit function expressed inEquation (10) constitutes a linear combination of featuresex-tracted from measurement dataF1, F2, . . . , FL. This impliesthat the logarithm of the odds-of-successg preserves the na-ture of features to be extracted from measurement signals.
Here, the main objective of the logistic regression is to iden-tify (L + 1) parametersβ in Equation (10) such that the lo-gistic model is readily implementable for the health assess-ment of a UOI. In this context, the parameter identification isnormally performed using the maximum-likelihood estima-tor, which entails finding the set of parameters for which theprobability of the observed data is maximal (Czepiel, n.d.).This is done off-line where two sets of features,Fhealth andFfailure respectively representing healthy and failure states,are used as training data.
5
First European Conference of the Prognostics and Health Management Society, 2012
118
European Conference of the Prognostics and Health Management Society, 2012
2.4. Prognostics Algorithm - Data Driven Approach
The prognostics algorithm proposed in this paper is based ona data-driven approach. The variable to be predicted is thelogarithm of the odds-of-successg. The main reason for thisconsideration is that this predicted variableg still preservesthe nature of features extracted from the measurements (thelinear combination of features). Basically, the algorithmcon-sists of four main steps, namely (i) determining the first timeinstant to start predictiont1p such that the history data avail-able att1p are sufficient, (ii) building a prognostics modelfrom the available data, (iii) predicting the trajectory ofthepredicted variableg to the future based on the built prog-nostics model and (iv) estimating the remaining useful life(RUL). When new data are available, the steps (ii) - (iv) areperformed and this procedure is periodically repeated until acertain time instant useful to do prediction. Thus, the prog-nostics model is updated once new data are provided and it isexpected that the model converges because more evidence ac-cumulates over time. These steps are discussed in more detailin the subsequent paragraphs.
In this paper,t1p is proposed as the time instant when thehealth indexh is equal to0.75. At this value (h = 0.75),theoretically, it is reasonable to assume that a UOI has passedabout25% of its total lifetime and the history data to builda prognostics model are practically available. In the domainof the predicted variableg, the aforementioned health indexh = 0.75 corresponds tog = 1.098.
The weighted mean slope (WMS) method proposed in (Bey-Temsamani et al., 2009b) is used in this paper to adaptivelybuild a prognostics model for given history data. This methodis easy to implement and based on a data-driven approachwhere the model to be built is updated periodically when newdata come in. In this method, all the local slopes of a timeseries are first computed. Afterwards, the slope at the end ofthe time series (i.e. at the arbitrary time instant to do pre-diction tp) is computed by summing up all the local slopesweighted by a certain function, where the weighting factor ofthe most recent data is the greatest. Letg = g1, g2, . . . , gNand t = t1, t2, . . . , tN be respectively the history of thelogarithm of the odds-of-success and the corresponding timesequence attp. The WMSbw at this particular time instanttpis calculated according to the following equation:
bw =N∑
n=2
ωnbn, (11)
withωn =
n∑N
n=2 n, (12)
andbn =
gn − gn−1
tn − tn−1n = 2, 3, . . . , N , (13)
wherebn andωn respectively denote the local slope and thecorresponding weighting factor. The standard deviationσb
of the WMS at time instanttp is calculated according to thefollowing equation:
σb =
√√√√N∑
n=2
ωn (bn − bw)2. (14)
For 95% confidence interval, the lower boundblowerw and
upper boundbupperw of the WMS can be calculated as fol-lows (Meeker & Escobar, 1998):
blowerw = bw − 1.96
σb√N − 1
, (15)
bupperw = bw + 1.96σb√N − 1
, (16)
As will be shown later in Section 4, the three features(DE , DSAM , τe) evolve linearly during the lifetime of thetested clutches. It is therefore reasonable to assume that thetrend of the predicted variableg is also linear since the natureof the features is preserved. Hence, the value of the predictedvariable at time instanttp + th, namelygtp+th , is given by:
gtp+th = gtp + bwth, (17)
wheregtp = gN .
Suppose that the failure threshold (RUL threshold)glimit isknown in advance. The expected RULr at an arbitrary timeinstanttp can be computed as:
r =glimit − gtp
bw. (18)
Based on the lower and upper bound of the WMS expressedin Equations (15) and (16), the uncertainty of prognostics (thelower bound∆rlower and the upper bound∆rupper of theRUL) can be estimated according to the following equations:
∆rlower = r − rlower, (19)
∆rupper = rupper − r, (20)
with
rlower =glimit − gtp
bupperw, (21)
rupper =glimit − gtp
blowerw
. (22)
3. EXPERIMENT
Service life data of wet friction clutches are required for theevaluation of the developed health assessment and prognos-tics method. In order to obtain the clutch service life datain a reasonable period of time, the concept of an acceleratedlife test (ALT) is applied in this study. For this purpose, afully instrumented SAE#2 test setup designed and built byour industrial partner, Dana Spicer Off Highway Belgium,was made available. In this respect, an ALT can be real-ized by means of applying a higher mechanical energy to a
6
First European Conference of the Prognostics and Health Management Society, 2012
119
European Conference of the Prognostics and Health Management Society, 2012
tested clutch compared to the amount of energy transmittedby a clutch in normal operation. The energy level is normallyadjusted by changing the initial relative velocity and/or theinertia of input and output flywheels. In this study, the ALTswere conducted on different commercial wet friction clutchesusing the fully instrumented SAE#2 test setup. During thetests, all the clutches were lubricated with the same Auto-matic Transmission Fluid (ATF). The test setup and the ALTprocedure are discussed in the following subsections.
3.1. SAE#2 test setup
The SAE#2 test setup used in the experiments, as depicted inFigure 4, basically consists of three main systems, namely:driveline, control and measurement system. The drivelinecomprises several components: an AC motor for driving theinput shaft (1), an input velocity sensor (2), an input flywheel(3), a clutch pack (4), a torque sensor (5), output flywheel(6), an output velocity sensor (7), an AC motor for drivingthe output shaft (8), a hydraulic system (11-20) and a heat ex-changer (21) for cooling the outlet ATF. An integrated controland measurement system (22) is used for controlling the ATFpressure (both for lubrication and actuation) to the clutchandfor the initial velocity of both input and output flywheels aswell as for measuring all relevant dynamic signals.
(a)
M M
V/A
D/AA/D
1
2
3
45
6
7
8
9 10
11
12
13
14
15
16
17
18
19
20
21
22
PC
(b)
Figure 4. The SAE#2 test setup used in the study, (a) photo-graph and (b) scheme, courtesy of Dana Spicer Off HighwayBelgium.
3.2. Test specification
The general specification of the test scenario is given in Ta-ble 1. Two clutch packs with different lining materials of thefriction discs were tested. It should be noted that all the usedfriction discs, separator discs and ATF are commercial oneswhich can be found in the market. In all the tests, the inlettemperature and flow of the ATF were kept constant, see Ta-ble 1. Additionally, one can see in the table that the inertiaofthe input flywheel (drum-side) is lower than that of the outputflywheel (hub-side).
Number of clutch packs to be tested 2Number of friction discs in the clutch assembly 8Inner diameter of friction disc (di) [mm] 115Outer diameter of friction disc (do) [mm] 160ATF John Deere J20CLubrication flow [liter/minute] 18Inlet temperature of ATF [C] 85Output flywheel inertia [kgm2] 3.99Input flywheel inertia [kgm2] 3.38Sampling frequency [kHz] 1
Table 1. General test specification.
3.3. Test procedure
Before an ALT is carried out to a wet friction clutch, a run-intest (lower energy level) is first conducted for 100 duty cyclesin order to stabilize the contact surface. The run-in test pro-cedure is in principle the same as the ALT procedure, but theinitial relative rotational velocity of the run-in tests islowerthan that of the ALTs. Figure 5 illustrates a duty cycle of theALT that is carried out as follows. Initially, while both in-put flywheel (drum-side) and output flywheel (hub-side) arerotating at predefined respective speeds in opposite direction,the two motors are powered-off and the pressurized ATF issimultaneously applied to a clutch pack at time instanttf .The oil thus actuates the clutch piston, pushing the frictionand separator discs towards each other. This occurs duringthe filling phase between the time instanttf and ta. Whilethe applied pressure is increasing, contact is gradually estab-lished between the separator and friction discs which resultsin an increase of the transmitted torque on the one hand anda decrease of the relative velocity on the other hand. Finally,the clutch is completely engaged when the relative velocityreaches zero at the lockup time instanttl. As the inertia andthe respective initial speed of the output flywheel (hub-side)are higher than those of the input flywheel, aftertl, both fly-wheels rotate together in the same direction as the output fly-wheel, see Figure 5. In order to prepare for the forthcomingduty cycle, both driving motors are braked at the time instanttb, such that the driveline can stand still for a while.
The ALT procedure discussed above is continuously repeateduntil a given total number of duty cycles is attained. For thesake of time efficiency in measurement, all the ALTs are per-
7
First European Conference of the Prognostics and Health Management Society, 2012
120
European Conference of the Prognostics and Health Management Society, 2012
tltatf tbt [s]
Sca
led
units
(see
labe
ls)
−40
−20
0
20
40 ATF temperature/5 [C]Pressure [bar]
Drum velocity/100 [rpm]
Hub velocity/100 [rpm]
Torque/100 [Nm]
Figure 5. A representative duty cycle of wet friction clutches.Note that the transmitted torque drops to zero after the lockuptime instanttl because there is no external load applied duringthe test.
formed for 10000 duty cycles. The pressure applied to theclutches is kept constant during the tests and the ATF is con-tinuously filtered, such that it is reasonable to assume thattheused ATF has not degraded during the tests.
4. RESULTS AND DISCUSSION
Figure 6 shows the optical images and the surface profile ofthe friction material before and after the ALT, taken fromthe first clutch pack. The images are captured using aZeissmicroscopeand the surface profiles are measured along thesliding direction using aTaylor Hobson Talysurf profilome-ter. It can be seen in the figure that the surface of the fric-tion material has become smooth and glossy and the clutch istherefore considered to have failed. The change of the colorand the surface topography of the friction material is knownas a result of the glazing phenomenon that is believed to becaused by a combination of adhesive wear and thermal degra-dation (Gao et al., 2002; Gao & Barber, 2002).
4.1. Capturing the Signal of Interest
Figure 7 shows 3D plots of the relative velocity signals of in-terest obtained from the first ALT (clutch pack#1) and secondALT (clutch pack#2). All the signals are captured at the samereference time instanttf with the same time record lengthτ of 2.5 s. As can be seen in the figure, the reference timeinstanttf is set to zero. Furthermore, it is evident from thefigure that the profile of the relative velocity signal deviatesfrom its initial profile, as the clutch degradation progresses(pointed out by the arrow). This deviation is indicated bytwo major patterns, namely (i) the changing shape and (ii)the shifting lockup time instanttl to the right hand-side withrespect to the reference time instanttf . This observation con-firms the experimental results reported in the literature (Fei etal., 2008).
−10
−20
−30
0
10
00 1 2 3 4 0.1x [mm]
z[µ
m]
φ(z)
(a)
−10
−20
−30
0
10
00 1 2 3 4 0.5x [mm]
z[µ
m]
φ(z)
(b)
Figure 6. Comparison of the friction material before and afterthe ALT of 10000 duty cycles. (a) optical image (left) and thecorresponding surface profile (right) of the friction materialbeforethe test, (b) optical image (left) and the correspond-ing surface profile (right) of the friction materialafter thetest. Note thatz denotes the displacement of the profilometerstylus in Z-axis (perpendicular to the surface),x denotes thedisplacement of the profilometer stylus in X-axis (along thesliding direction) andφ(z) denotes the probability distribu-tion function of the surface profile.
t [s]Ncycle [-]
nrel
[rpm
]
0
01
2
0
2500
5000
500010000
(a)
t [s]Ncycle [-]
nrel
[rpm
]
0
01
2
0
2500
5000
500010000
(b)
Figure 7. Evolution of the relative velocity signals of interestobtained from (a) thefirst ALT and (b) thesecondALT.
8
First European Conference of the Prognostics and Health Management Society, 2012
121
European Conference of the Prognostics and Health Management Society, 2012
4.2. Extracted Features
The features introduced in Section 2 are extracted from therelative velocity signal of interest as shown in Figure 7 basedon Equations (4), (6) and (7). Figure 8 shows the evolutionof the features in function of the clutches service life. It isremarkable to mention that the trends of all the features arelinearly increasing with relatively small variations.
τ e[-
]D
E[-
]D
SAM
[-]
Ncycle [-]
0
0
0
0.2
0.2
0.2
0.4
0.4
0.4
0 2500 5000 7500 10000
ALT1
ALT2
Figure 8. Evolution of the features obtained from the first andsecond ALT.
4.3. Logistic Model
In order to build a logistic model for clutch health assessment,a number of sets of the features,Fhealth =
τ ie, D
iE , D
iSAM
andFfailure =τ fe, D
fE , D
fSAM
, respectively representing
healthy and failure states are required. Note that the super-scripts i and f respectively denote the healthy and failure state.Table 2 lists the sets of features used for logistic regressionfor different observations on the features extracted from themeasurement data as shown in Figure 8. The health indexh assigned for the healthy and failure states are respectively0.95 and 0.05. It should be mentioned here that these two val-ues are heuristically derived since there are no enough historydata.
Using the training data listed in Table 2, the parameters ob-tained from the logistic regression can be written as:
β = [3.09 2.07 − 35.96 3.57]T.
Based on the identified parameters, the logistic model rep-resented as the health indexh in function of the clutch dutycyclesNcycle can be expressed in the following equation:
h(Ncycle) =eg(Ncycle)
1 + eg(Ncycle), (23)
with
g(Ncycle) = 3.09 + 2.07τe − 35.96DE + 3.57DSAM . (24)
Figure 9 shows the evolution of the health indexh of the twotested clutches. As was expected, the health index decreasesprogressively during the service life of the clutches. Since theindex is restricted between 0 and 1, this figure can thus easethe users to justify the health status of the clutches. Whenthe index value is close to 1, one can directly justify that theclutches are healthy, while the index approaches 0, one caneasily justify that the clutches are going to fail.
h[-
]
Ncycle [-]0 2500 5000 7500 10000
0
0.25
0.5
0.75
1
ALT1ALT2
Figure 9. The health indexh evolution of both tested clutches.
4.4. Prognostics Performance
In this subsection, the performance of the proposed prognos-tics algorithm is demonstrated. Figure 10 shows the evolu-tion of the logarithm of the odds-of-successg which has beenspecified as the predicted variable. Wheng = 1.098 (i.e.crossing the upper horizontal line) the algorithm is triggeredfor the first time to build a prognostics model (at 3000th cy-cle) and the trajectory ofg indicated by the gray dashed line isconsecutively predicted until it crosses the predefined thresh-old (the lower horizontal line). The RUL thresholdglimit isset at the value of−2.197 which corresponds to the health
Observation τe DE DSAM
1 0 0 0Healthy 2 0.0033 0.0034 0.0051
state (h = 0.95) 3 0.017 0.0122 0.0174 0.0109 0.0069 0.0109
1 0.3 0.2 0.27Failure 2 0.32 0.22 0.29
state (h = 0.05) 3 0.3205 0.2186 0.29954 0.3593 0.2216 0.3062
Table 2. Sets of features used for logistic regression analysis
9
First European Conference of the Prognostics and Health Management Society, 2012
122
European Conference of the Prognostics and Health Management Society, 2012
index of 0.1. At this particular value (glimit = −2.197), itis reasonable to assume that the tested clutches have passedabout 90% of their expected total lifetime. For a compari-son, the prediction at 7000th cycle is also shown in the fig-ure. As can be seen, the predicted trajectory ofg at the lattercycle (shown by the solid black line) gets closer to the mea-surement data indicating that the model has been updated.
0
@3000th @7000th
RUL threshold
Ncycle [-]
g[-
]
Measurement
2500 5000 7500 10000
-4
0
4
Figure 10. Representative evolution of the logarithm of theodds-of-successg.
ActualEstimated
r[-
]
Ncycle [-]3000 6000 9000
0
4000
8000
(a)
ActualEstimated
r[-
]
Ncycle [-]3000 6000 9000
0
4000
8000
(b)
Figure 11. Comparison of the estimated and actual RULs of:(a) the first clutch pack and (b) the second clutch pack.
The RUL estimations of both clutches are depicted in Fig-ure 11. As can be seen in the figure, the error between theestimated RULs and the actual RULs, and the correspondinguncertainty interval are quite large in the beginning of thepre-diction because limited amount of data are available to build
the prognostics model. When more evidence becomes avail-able, it is evident from the figure that the estimated RULs tendto converge to the actual RULs and the uncertainties tend todecrease over time, implying that the prognostics model im-proves over time.
5. CONCLUSION AND FUTURE WORK
In this paper, an attempt to develop a health (performance)assessment and prognostics methodology for wet frictionclutches has been presented and discussed. For health as-sessment purposes, all the extracted features are fused into asingle variable called the health indexh which is restrictedbetween 0 and 1, based on a logistic regression solved withthe maximum likelihood estimation technique. In this way,a logistic model can be built that allows a direct justificationon the health of wet friction clutches. In terms of prognos-tics, the logarithm of odds-of-success,i.e. log(h/(1 − h))is assigned as the predicted variable. The weighted meanslope (WMS) method, which is simple and easy to imple-ment, is used to predict the trajectory of the predicted vari-able and consecutively to predict the remaining useful life(RUL) of clutches. The proposed methodology has been ex-perimentally evaluated on two commercially available clutchpacks with different friction materials. The experimentalre-sults confirm that the methodology proposed in this paper ispromising in aiding the development a maintenance strategyfor wet friction clutches.
The experiments carried out in this study were under con-trolled environment. More data will be collected in the fu-ture where some variations of loading, operational tempera-ture and applied pressure during the duty cycles are present.Furthermore, some prospective algorithms need to be evalu-ated in future work in order to determine the most optimalone, in regard to the accuracy, convergence rate and the prac-tical implementation.
ACKNOWLEDGMENT
All the authors are grateful for the experimental support byDr. Mark Versteyhe of Dana Spicer Off Highway Belgium.
10
First European Conference of the Prognostics and Health Management Society, 2012
123
European Conference of the Prognostics and Health Management Society, 2012
NOMENCLATURE
t timeW instantaneous concentration of cellulose fibersW0 initial concentration of cellulose fibersnrel relative velocityp pressuretf reference time instanttl lockup time instantt1p time instant for first predictiontp arbitrary time instant for predictionτ time record lengthX vector denoting a discrete relative velocity signal
measured in an initial (healthy) conditionY vector denoting a discrete relative velocity signal
measured in an arbitrary conditionτe normalized engagement durationDE normalized Euclidean distanceDSAM normalized SAM distanceF a set of featuresh health indexg logarithm of the odds-of-successglimit RUL thresholdbn local slopeωn weighting factorbw weighted mean slopeσb weighted standard deviationr remaining useful life (RUL)Ncycle number of duty (engagement) cycles
REFERENCES
Bansal, D., Evans, D. J., & Jones, B. (2004). A real-timepredictive maintenance system for machine systems.International Journal of Machine Tools and Manufac-ture, 44(7-8), 759 - 766.
Basseville, M., Benveniste, A., Gach-Devauchelle, B., Gour-sat, M., Bonnecase, D., Dorey, P., et al. (1993). In situdamage monitoring in vibration mechanics: diagnos-tics and predictive maintenance.Mechanical Systemsand Signal Processing, 7(5), 401 - 423.
Bey-Temsamani, A., Engels, M., Motten, A., Vandenplas,S., & Ompusunggu, A. P. (2009a). Condition-BasedMaintenance for OEM’s by application of data miningand prediction techniques. InProceedings of the 4thWorld Congress on Engineering Asset Management.
Bey-Temsamani, A., Engels, M., Motten, A., Vandenplas, S.,& Ompusunggu, A. P. (2009b). A Practical Approachto Combine Data Mining and Prognostics for ImprovedPredictive Maintenance. InThe 15th ACM SIGKDDConference on Knowledge Discovery and Data Min-ing.
Czepiel, S. (n.d.). Maximum likelihood esti-mation of logistic regression models: the-ory and implementation. Available from
http://czep.net/stat/mlelr.pdfFei, J., Li, H.-J., Qi, L.-H., Fu, Y.-W., & Li, X.-T. (2008).
Carbon-Fiber Reinforced Paper-Based Friction Mate-rial: Study on Friction Stability as a Function of Oper-ating Variables.Journal of Tribology, 130(4), 041605.
Gao, H., & Barber, G. C. (2002). Microcontact Model forPaper-Based Wet Friction Materials.Journal of Tribol-ogy, 124(2), 414 - 419.
Gao, H., Barber, G. C., & Chu, H. (2002). Friction Character-istics of a Paper-based Friction Material.InternationalJournal of Automotive Technology, 3(4), 171 - 176.
Garcia, M. C., Sanz-Bobi, M. A., & Pico, J. del. (2006).SIMAP: Intelligent System for Predictive Mainte-nance: Application to the health condition monitor-ing of a wind turbine gearbox.Computers in Industry,57(6), 552 - 568.
Guan, J. J., Willermet, P. A., Carter, R. O., & Melotik., D. J.(1998). Interaction Between ATFs and Friction Ma-terial for Modulated Torque Converter Clutches.SAETechnical Paper, 981098, 245 - 252.
Jullien, A., Meurisse, M., & Berthier, Y. (1996). Determina-tion of tribological history and wear through visualisa-tion in lubricated contacts using a carbon-based com-posite.Wear, 194(1 - 2), 116 - 125.
Kruse, F., Lefkoff, A., Boardman, J., Heidebrecht, K.,Shapiro, A., Barloon, P., et al. (1993). The spectralimage processing system (SIPS) - interactive visualiza-tion and analysis of imaging spectrometer data.RemoteSensing of Environment, 44(2-3), 145 - 163.
Lemeshow, D., & Hosmer, S. (2000).Applied Logistic Re-gression. New York: Willey. ISBN 0-471-35632-8.
Li, S., Devlin, M., Tersigni, S. H., Jao, T. C., Yatsunami, K.,& Cameron., T. M. (2003). Fundamentals of Anti-Shudder Durability: Part I-Clutch Plate Study.SAETechnical Paper, 2003-01-1983, 51 - 62.
Maeda, M., & Murakami, Y. (2003). Testing method and ef-fect of ATF performance on degradation of wet frictionmaterials. SAE Technical Paper, 2003-01-1982, 45 -50.
Matsuo, K., & Saeki, S. (1997). Study on the change offriction characteristics with use in the wet clutch of au-tomatic transmission.SAE Technical Paper, 972928,93 - 98.
Meeker, W., & Escobar, L. (1998).Statistical Methods forReliability Data. Wiley, New York.
Mobley, R. K. (2002).An introduction to predictive mainte-nance. Butterworth-Heinemann.
Nyman, P., Maki, R., Olsson, R., & Ganemi, B. (2006). Influ-ence of surface topography on friction characteristics inwet clutch applications.Wear, 261(1), 46 - 52. (Paperspresented at the 11th Nordic Symposium on Tribology,NORDTRIB 2004)
Ompusunggu, A., Papy, J.-M., Vandenplas, S., Sas, P., &VanBrussel, H. (2009). Exponential data fitting for
11
First European Conference of the Prognostics and Health Management Society, 2012
124
European Conference of the Prognostics and Health Management Society, 2012
features extraction in condition monitoring of paper-based wet clutches. In C. Gentile, F. Benedettini,R. Brincker, & N. Moller (Eds.),The Proceedings ofthe 3rd International Operational Modal Analysis Con-ference (IOMAC)(Vol. 1, p. 323-330). Starrylink Ed-itrice Brescia.
Ompusunggu, A., Papy, J.-M., Vandenplas, S., Sas, P., & Van-Brussel, H. (2012). Condition Monitoring Method forAutomatic Transmission Clutches.International Jour-nal of Prognostics and Health Management (IJPHM),3.
Ompusunggu, A., Sas, P., VanBrussel, H., Al-Bender, F.,Papy, J.-M., & Vandenplas, S. (2010). Pre-filteredHankel Total Least Squares method for condition mon-itoring of wet friction clutches. InThe Proceedings ofthe 7th International Conference on Condition Moni-toring and Machinery Failure Prevention Technologies(CM-MFPT). Coxmor Publishing Company.
Ompusunggu, A., Sas, P., VanBrussel, H., Al-Bender, F., &Vandenplas, S. (2010). Statistical feature extractionof pre-lockup torsional vibration signals for conditionmonitoring of wet friction clutches. InProceedings ofISMA2010 Including USD2010.
Ost, W., Baets, P. D., & Degrieck, J. (2001). The tribolog-ical behaviour of paper friction plates for wet clutchapplication investigated on SAE # II and pin-on-disktest rigs.Wear, 249(5-6), 361 - 371.
Paclik, P., & Duin, R. P. W. (2003). Dissimilarity-basedclassification of spectra: computational issues.Real-Time Imaging, 9(4), 237 - 244.
Srinivas, J., Murthy, B. S. N., & Yang, S. H. (2007). Dam-age diagnosis in drive-lines using response-based opti-mization.Proceedings of the Institution of MechanicalEngineers, Part D: Journal of Automobile Engineering,221(11), 1399 - 1404.
Yan, J., Koc, M., & Lee, J. (2004). A prognostic algorithm formachine performance assessment and its application.Production Planning & Control, 15(8), 796-801.
Yan, J., & Lee, J. (2005). Degradation Assessment andFault Modes Classification Using Logistic Regression.Journal of Manufacturing Science and Engineering,127(4), 912-914.
Yang, Y., & Lam, R. C. (1998). Theoretical and experimen-tal studies on the interface phenomena during the en-gagement of automatic transmission clutch.TribologyLetters, 5, 57 - 67.
Yang, Y., Lam, R. C., Chen, Y. F., & Yabe, H. (1995). Model-ing of heat transfer and fluid hydrodynamics for a mul-tidisc wet clutch. SAE Technical Paper, 950898, 1 -15.
Yang, Y., Twaddell, P. S., Chen, Y. F., & Lam, R. C. (1997).Theoretical and experimental studies on the thermaldegradation of wet friction materials.SAE TechnicalPaper, 970978, 175 - 183.
BIOGRAPHIES
Agusmian Partogi Ompusunggu is a project engineer atFlanders’ MECHATRONICS Technology Centre (FMTC),Belgium. His research focuses in condition monitoring,prognostics, vibration testing and analysis and tribology.He earned his bachelor degree in mechanical engineering(B.Eng) in 2004 from Institut Teknologi Bandung (ITB),Indonesia and master degree in mechanical engineering(M.Eng) in 2006 from the same technological institute. He iscurrently pursuing his PhD degree in mechanical engineeringat Katholieke Universiteit Leuven (K.U.Leuven) Belgium.
Steve Vandenplasis a program leader at Flanders’ MECHA-TRONICS Technology Centre (FMTC), Belgium. Hereceived his Master’s Degree of Electrotechnical Engineerin1996 from the Vrije Universiteit Brussel (VUB), Belgium.In 2001, he received a PhD in Applied Science and startedto work as R&D Engineer at Agilent Technologies for oneyear. Thereafter, he decided to work as a PostdoctoralFellow at the K.U. Leuven in the Department of Metallurgyand Materials Engineering, in the research group materialperformance and non-destructive testing (NDT). He has beenworking at Flanders’ MECHATRONICS Technology Centre(FMTC) since 2005, where he is currently leading FMTC’sresearch program on “Monitoring and Diagnostics”. Hismain interests are on machine diagnostics and conditionbased maintenance (CBM).
Paul Sas is a full professor at the Department of Me-chanical Engineering of Katholieke Universiteit Leuven(K.U.Leuven), Belgium. He received his master and doctoraldegree in mechanical engineering from K.U.Leuven. Hisresearch interest comprise numerical and experimentaltechniques in vibro-acoustics, active noise and vibrationcontrol, noise control of machinery and vehicles, structuraldynamics and vehicle dynamics. He is currently leadingthe noise and vibration research group of the Department ofMechanical Engineering at K.U.Leuven.
Hendrik Van Brussel is an emeritus professor at the De-partment of Mechanical Engineering of Katholieke Univer-siteit Leuven (K.U.Leuven), Belgium. He was born at Ieper,Belgium on 24 October 1944, obtained the degree of Tech-nical Engineer in mechanical engineering from the HogerTechnisch Instituut in Ostend, Belgium in 1965 and an en-gineering degree in electrical engineering at M.Sc level fromK.U.Leuven. In 1971 he got his PhD degree in mechanicalengineering, also from K.U.Leuven. From 1971 until 1973he was establishing a Metal Industries Development Centerin Bandung, Indonesia and he was an associate professor atInstitut Teknologi Bandung (ITB), Indonesia. He was a pio-neer in robotics research in Europe and an active promoter of
12
First European Conference of the Prognostics and Health Management Society, 2012
125
European Conference of the Prognostics and Health Management Society, 2012
the mechatronics idea as a new paradigm in machine design.He has published more than 200 papers on different aspects ofrobotics, mechatronics and flexible automation. His researchinterests shifted towards holonic manufacturing systems andprecision engineering, including microrobotics. He is Fellowof SME and IEEE and in 1994 he received a honorary doctor
degree from the ’Politehnica’ University in Bucharest, Roma-nia and from RWTH, Aachen, Germany. He is also a Memberof the Royal Academy of Sciences, Literature and Fine Artsof Belgium and Active Member of CIRP (International Insti-tution for Production Engineering Research).
13
First European Conference of the Prognostics and Health Management Society, 2012
126
Health management system for the pantographs of tilting trains
Giovanni Jacazio1, Massimo Sorli
2, Danilo Bolognese
3, Davide Ferrara
4
1,2,3,4Politecnico di Torino –Department of Mechanical and Aerospace Engineering, Turin, 10129, Italy
ABSTRACT
Tilting trains are provided with the ability of rotating their
carbodies of several degrees with respect to the bogies about
the longitudinal axis of the train. This permits a train to
travel at a high speed while maintaining an acceptable
passenger ride quality with respect to the lateral
acceleration, and the consequent lateral force, received by
the passengers when the train travels on a curved track at a
speed in excess of the balance speed built into the curve
geometry. When the carbody is tilted with respect to the
bogie, the train pantograph needs to remain centered with
respect to the overhead catenary, which is aligned with the
track. The conventional solution is to mechanically link the
pantograph to the bogie, but recent tilting trains have the
pantograph connected to the carbody roof while a position
servoloop continuously control the pantograph position such
to keep it centered with the catenary. The merit of this
design is to allow a gain of the useful volume inside the
carbody. The pantograph position servoloop uses two
position sensors providing a redundant position information
to close the pantograph feedback loop and perform system
monitoring.
The monitoring functions presently implemented in
pantograph position controls are able to detect the
servocontrol failures, but in case of conflicting information
from the two position transducers they are not always able
to sort out which of the two transducer is failed because
some failures of the position transducers cannot be detected
by simply looking at the output signals of the transducer.
As a result, if a difference between the output signals of the
two position transducers is detected, the tilting function is
disabled and the train speed is reduced. Also, the entire
pantograph is then removed and replaced because the
functionality of each individual transducer can only be
checked at shop level.
Developing better diagnostic techniques for the pantograph
position control system have been encouraged by the train
companies, but no work on this subject has so far been
performed. A research activity was hence conducted by the
Authors, that was aimed at developing an advanced
diagnostic system that can both identify the presence of a
failure and recognize which of the two position transducers
is the failed one. In case of a transducer failure it is thus
possible to isolate the failed transducer and keep the
pantograph position control operational, thereby retaining
the train tilting function. A further merit of the advanced
diagnostic system is the reduction of maintenance time and
costs because the failed transducer can be replaced without
removing the entire pantograph from the train.
The general architecture of this innovative diagnostic
system, the associated algorithms, the mathematical models
for the system simulation and validation, the simulation
results and the possible future developments of this health
management system are presented in the paper.
1. THE PANTOGRAPHS OF TILTING TRAINS
Tilting trains perform carbody tilting towards curve’s inner
side, to reduce centrifugal force in curves at passengers’
level and, therefore, to maintain a better or equivalent
passenger comfort with respect to the lateral acceleration
(and the consequent lateral force) on same curves’ geometry
at enhanced service speed (Figure 1). By tilting the carbody
of a rail passenger vehicle relative to the track plane during
curve negotiation, it is therefore possible to operate at
speeds higher than might be acceptable to passengers in a
non-tilting vehicle, and thus reduce overall trip time.
_____________________
G. Jacazio, M. Sorli, D. Bolognese and D. Ferrara. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0
United States License, which permits unrestricted use, distribution, and
reproduction in any medium, provided the original author and source are credited.
First European Conference of the Prognostics and Health Management Society, 2012
127
European Conference of Prognostics and Health Management Society 2012
2
Figure 1. Tilting train concept
The recognized advantage of tilting trains is to increase the
achievable service speed for passenger trains on existing
tracks without being forced to invest very large sums of
money to build a dedicated new track or to alter the
geometry of the existing curves (Boon & Hayes, 1992).
Both hydraulic and electromechanical actuation systems
have been used to provide the controlled force necessary to
tilt the carbodies of the train vehicles, though the majority of
tilting trains in revenue service use hydraulic actuation
systems.
A critical design issue associated with carbody tilting is the
need to maintain the train pantograph centered with respect
to the overhead catenary running at midpoint between the
two track rails. Most tilting trains implemented the solution
of rigidly connecting the pantograph structure to the bogie
by means of a truss passing through the carbody. This is a
simple design concept, but it reduces the useful volume
within the carbody because enough empty space must be left
around the vertical beams of the truss to accommodate
carbody tilting. Most of the tilting trains developed in the
last 10 years, however, used a different design in which the
pantograph supporting structure is directly connected to the
carbody roof while the pantograph itself can be moved
relatively to its supporting structure in a direction opposite
to the carbody tilting. By appropriately controlling the
pantograph lateral position with respect to the carbody roof
it is then possible to maintain the pantograph aligned with
the catenary also when the carbody tilts. This is
accomplished by an actuation system receiving the
commands from the train electronics as outlined in the next
section. The advantage of this solution is to allow a gain of
useful space within the carbody.
The actuation technology used for the pantograph control of
tilting trains following this design concept was the same as
the carbody tilting system. The research activity presented
in this paper was focused to the latest tilting train developed
by Alstom (the so called: "Nuovo Pendolino") which makes
use of hydraulic actuation, and the health management
system developed in this research specifically refers to a
hydraulically actuated pantograph control system. However,
the same health management philosophy can be followed to
develop effective diagnostic algorithms for an electrically
actuated pantograph control system.
2. PANTOGRAPH POSITION CONTROL SYSTEM
The control of the lateral position of the pantograph with
respect to the tilting carbody is performed by a closed loop
system using two single-acting hydraulic actuators mounted
as an opposite pair and controlled by an electrohydraulic
servovalve (Figure 2). The pantograph is mounted on a
carriage that can be moved along two tracks perpendicular
to the longitudinal axis of the vehicle. The rod end of each
of the two hydraulic actuators is connected to the carriage,
while the head end is connected to a structure fixed to the
carbody roof. Two springs mounted between the carriage
and the frame maintain the pantograph centered in its mid
position when the pantograph control system is not active.
Each of the two single-acting hydraulic actuators accepts the
controlled flow from one of the two control ports of an
electrohydraulic servovalve; therefore, the total of the
servovalve and the two single-acting hydraulic actuators is
equal to a hydraulic servocontrol comprised of a servovalve
and a double-acting hydraulic actuator. The hydraulic power
supply is provided by a constant pressure hydraulic power
generation and control unit (HPGCU) located in the train
vehicle undercarriage. The pantograph position command is
generated by the train electronics simultaneously to the
carbody tilt command as a function of the lateral
acceleration, and a position servoloop is created for the
pantograph in which the command is compared to the actual
lateral position in order to close the position feedback loop.
The servoloop position errors are processed by an
appropriate control law that eventually generates the input
signal to the flow control servovalve. The pantograph
lateral position is measured by two position sensors, with
each sensor placed inside one of the two hydraulic actuators.
The pantograph position control loop is single-hydraulic,
dual-electrical and uses a single electrohydraulic servovalve
with independent electrical coils accepting the control
currents from the two independent control computers. Each
computer interfaces with one of the two position sensors and
mutually exchanges with the other computer the information
of pantograph lateral position and servovalve current as well
as the computer health status. Each computer thus generates
an equal consolidated position feedback based on the
average of the pantograph position sensors signals.
The control law (Figure 3) is based on a PID controller with
a relatively low value of the integrator gain and a saturation
on the integrator output. The function of the integrator is in
fact to compensate for the steady state, or slow varying
servovalve offsets, while the dynamic performance is
dependent on the proportional and derivative gains of the
control law.
A comparison of the signals of the two sensors is
continuously performed during the train ride and if the
First European Conference of the Prognostics and Health Management Society, 2012
128
European Conference of Prognostics and Health Management Society 2012
3
difference between these signals is greater than a given
threshold, an alert is generated and the tilting system
operation is disabled. Both the carbody and the pantograph
actuators are set in a bypass mode connecting the actuators
lines to return. The tilting carbody recenters under its own
weight while the pantograph recenters under the action of its
springs. As the train tilting is disabled, the train speed is
reduced to maintain an acceptable comfort level for the
passengers and train safety, but with the consequence of a
travel delay.
The rationale for disabling the train tilting in case of a
discrepancy between the signals of the two position sensors
of the pantograph is the concern of not always being able to
detect the failure of each individual sensor. Failures such as
a broken wire or a short circuit lead to an out of scale signal
that can be easily detected, but other failures such as, for
example, degradations originating variations of the sensors
scale factor or increased offsets are failures cases that
cannot be detected by the existing monitoring logics.
Therefore, it can well be that a difference between the
signals of the two sensors is detected, but it is not possible
to understand which of the two is the failed one. Moreover,
in case the existing monitoring system recognizes and
isolates the failed sensor, a risk exists that a subsequent
failure of the remaining active sensor might go undetected,
which could lead to a safety critical condition. The end
result is that a single transducer failure leads to a reduction
of the train speed even though the remaining transducer
could still be able to control the pantograph position.
Figure 2. Concept schematics of the pantograph position control system
Input
command
HPGCUServovalve
By-pass
valve
Servoactuator
Centring
spring
Internal
position
transducer
Moving carriage Fixed frame
Enable
command
Electronic
Control Unit
Transducer signal
Transducer signal
Position command
from train electronics
First European Conference of the Prognostics and Health Management Society, 2012
129
European Conference of Prognostics and Health Management Society 2012
4
Figure 3. Block diagram of the pantograph control law
A research activity was then conducted to develop a more
sophisticated diagnostic procedure allowing to detect the
degradation of each individual transducer by appropriately
processing all available signals by means of dedicated
algorithms. This new diagnostic procedure brings about two
benefits: it sorts out which of the two transducers is failed in
case of discrepancy between the sensors signals and allows
to detect a failure of the remaining active sensor after the
other sensor has already failed. This allows the train to
maintain the tilting system active, and thus a high train
speed, after the loss of one of the two pantograph sensors,
thereby improving the tilting system availability.
A further advantage brought about by the improved
diagnostics is to simplify the maintenance operation.
Presently, when a difference between the sensors output is
signalled, the maintenance crew removes and replaces the
entire pantograph, which is a time consuming and costly
operation. The implementation of a health monitoring
system able to specifically detect the failed transducer not
only improves the tilting system availability but also
reduces the maintenance costs.
3. ADVANCED HEALTH MANAGEMENT SYSTEM
The health management system herein presented was
devised for being applied to legacy systems. It does not
require any hardware modification, but makes better use of
the available signals to enhance the ability of detecting an
anomalous behaviour of the pantograph position control
system allowing the tilting operation to continue also after a
sensor failure.
The health management system is based on real-time
modeling of the pantograph control system and consists of
three separate functions:
Coherence check
Learning process
Monitoring process
These three functions are continuously performed during the
train ride, however, when a sensor failure is detected the
learning process is permanently stopped. If a failure of the
servovalve electrical section, or of its servoamplifier is
detected, the learning process is temporarily stopped and it
resumes after the train electronics has switched the
servovalve control from the failed lane to the previously
standby lane. The purpose of the learning process is in fact
to continuously tune the values of the parameters used by
the pantograph real-time model so it can be effective as long
as all system components are operating correctly. If any
component fails, the learning process loses its significance
and the monitoring process continues using the last values
of the system parameters that were determined by the
learning process before the failure occurred.
The outputs generated by coherence check and monitoring
process are then routed to a decision maker that fuses all
information providing the train electronics with the
indication of the health of the pantograph position control
system. Figure 4 shows the flow chart of the processes
performed by the health management system. The three
functions performed by the health management system are
described in the following sections.
4. COHERENCE CHECK
The coherence check is performed on the signals of the two
position sensors and on the servovalve current. The
coherence check for the signals of the two position sensors
consists of two operations:
Verification that the output signal of the sensor is
within a valid range
Comparison between the output signals of the two
redundant sensors
The signals A and B provided by the two position sensors
are first checked to verify that they are in their valid range
of 4 to 20 mA. In case the electrical output signal is outside
this range a failure of that sensor is recognized, its signal is
discarded and the pantograph control continues using the
remaining sensor to close the position feedback loop. If both
signals A and B pass the valid range check, they are
compared to each other. If their difference is below an
s
K i
pK
dKs
+-
+
++
L
- L
- E
E
fbk
com err i
First European Conference of the Prognostics and Health Management Society, 2012
130
European Conference of Prognostics and Health Management Society 2012
5
acceptable threshold, a signals coherence and hence a good
health status is recognized; however, if a difference above
the threshold prevails and lasts more than a given time, a
lack of signals coherence is detected. In this case the
position feedback, which is obtained by performing the
average of the two sensors output signals, is obviously
corrupted. When such condition occurs, the ensuing
monitoring process will sort out which sensor is good and
which failed thereby allowing the pantograph position
control system to continue to operate. Based on an analysis
of operational data the threshold for signaling a lack of
coherence was set at a value corresponding to 6 % of the
full actuators travel.
The servovalve coherence check is a monitor that is
currently performed in the pantograph actuation systems for
tilting trains. It is performed by implementing a current
wrap-around which consists of measuring the actual current
circulating through the servovalve coils and comparing it
with the current command. Each of the two servovalve coils
interfaces with one of the two sections of the control
electronics, with the two coils operated in an active/standby
mode. Only one of the two coils is active and the other coil
is activated after a failure of the first coil is detected. When
the coherence check detects a discrepancy of more than 15%
of the rated current and such discrepancy lasts more than
100 ms, a failure of the electrical section of the servovalve
is recognized. That section is then switched off and the
previously standby section is activated. In case a second
failure occurs, then the entire system is shut down and the
train tilting is disabled.
It must also be noticed that the servovalve coherence check
is instrumental in not only detecting the failures of the
electrical section of the servovalve, but also those of its
electrical driver.
5. LEARNING PROCESS
The learning process, as well as the monitoring process,
makes use of a mathematical model of the pantograph
position control system to perform their tasks. The basic
concept for learning and monitoring processes is that for a
servovalve controlled hydraulic actuator, servovalve current,
flow rate and pressure differential across the servovalve
control ports are three mutually related variables. For a
given servovalve, if two of these variables are defined, the
third one can be derived. Models of servovalve controlled
electrohydraulic systems are shown in the literature
(Borello, Dalla Vedova, Jacazio & Sorli, 2009 - Byington,
Watson, Edwards & Stoelting, 2004). For the pantograph
hydraulic actuation system the previously three referenced
variables are either known or can be determined from the
available information without additional sensors, as it will
be discussed in the following.
The servovalve current is a known variable at any instant
in time since it is generated by the electronic controller and
the fundamental issue is therefore to real time computing the
values of flow rate and pressure differential from the signals
provided by the actuators position sensors.
The calculation of the flow rate is relatively simple
because the flow rate is the product of the actuators area
times their speed . The actuators area is a known design
parameter, while the speed can be determined by performing
the time derivative of the actuators position provided by the
position sensors. The pressure differential can
thus be determined form the well known servovalve
pressure/flow relationship:
(1)
where is a known parameter defined by the servovalve
characteristics, and are the supply and return pressure
of the hydraulic system. These pressures are approximately
constant values because the train hydraulic power
generation is a constant pressure system and should the
supply pressure decrease below normal, a hydraulic system
failure is recognized by the relevant monitoring logic, while
the return pressure is constant because the reservoir is open
to the ambient.
It is important to notice that the control law of the
pantograph position servoloop consists of a PI controller in
which the control is essentially performed by the
proportional gain, while the integrator gain has a small
value, it has a saturation and its purpose is to cancel out the
effects of the steady state errors that are originated by the
servovalves offsets. By this way, the effects of the
servovalve offsets are eliminated and the servovalve is
centered in its hydraulic null when the servoloop error is
zero. The current i of all equations of this paper is thus the
current determined by the proportional gain, which actually
determines the servovalve opening, while the contribution to
the current given by the integrator gain exactly matches the
servovalve offset.
Equation (1) describes the steady-state relationship between
flow, pressures and servovalve current, and it does not
include the servovalve dynamics. For the pantograph
hydraulic control system the servovalve dynamics is about
two orders of magnitude faster than that of the overall
pantograph position servoloop; therefore, neglecting the
servovalve dynamics for the real time modeling of the
pantograph position control system does not introduce any
appreciable error.
The pressure differential across the actuators can
however be determined from the balance of the forces acting
on the actuators. This pressure differential is in fact equal to
the force globally developed by the two actuators divided by
their area.
First European Conference of the Prognostics and Health Management Society, 2012
131
European Conference of Prognostics and Health Management Society 2012
6
Figure 4. Flow chart for the health management of the pantograph position control system
The forces acting on the pantograph when it is moved away
from its centered position are:
Forces developed by the centering springs
Friction forces
Lateral component of the aerodynamic force acting on
the pantograph
Inertia force associated to the mass of the translating
pantograph
For the pantograph position control system, the prevailing
force acting on the actuators is by far the force developed by
the recentering springs. The springs are preloaded and the
force that they develop is a function of the actuators position
as shown in Figure 5.
The spring forces are in theory a known quantity since the
spring stiffness ( ) is a design value. However, the
construction tolerances of the mechanical structure
accommodating the pantograph on the carbody roof and
some variations of the dimensions associated to the
temperature changes lead to some uncertainty on the value
of the springs preload ( ). While the spring rate can
reasonably be considered a well defined parameter, the
actual installed length of the springs and hence their preload
can exhibit some variation that must be properly assessed.
The friction forces ( ) are lower than the spring forces, but
still give a significant contribution to the overall force acting
on the actuators. The friction forces can exhibit a large
variation, depending on the environmental conditions, on
the condition of the carriage tracks along which the
pantograph carriage moves, and on the progressive wear of
the pantograph moving components with life.
The aerodynamic forces in the lateral direction and the
inertia forces are little significant for this application and
can be neglected by the health monitoring systems. They
actually act as potential disturbances that were properly
addressed in the assessment of the health management
system robustness.
Valid range
check
Valid range
check
Fail Good Good Fail
Sensors signal
coherence verification
Learning process:
System
parameters
identification
Uncommanded
movement / lack
of response
Command
current
Sensors signals
A B
Good
Decision maker
Fail
Failed sensor
identification
Model
update
AND
OR
Current wrap
around
Good Fail
Measured
current
Failure
detection
Sensors
status
Position
command
Remaining
active sensor
status
(Good/Fail)
System
operational
status
(Good/Fail)
Health status to
train electronics
Coherence
check
Monitoring
process
Broken
spring alert
First European Conference of the Prognostics and Health Management Society, 2012
132
European Conference of Prognostics and Health Management Society 2012
7
Figure 5. Diagram of actuators displacement versus spring
forces
An important fact to be considered is that the force
developed by the springs is always in the direction of
centering the pantograph. The spring force thus acts as an
opposing load when the pantograph carriage moves away
from the centered position, while it acts as an aiding load
when the pantograph carriage moves towards the centered
position. On the contrary, the friction forces are always
opposing the carriage movement.
Based on the above outlined considerations, after having
defined a positive direction for the actuator travel , the
following simple equations for the balance of the forces
acting on the actuators can be written (note that is a
positive quantity because is the absolute value of the springs
preload).
When :
for positive actuators
speed (opposing load) (2)
for negative actuators
speed (aiding load) (3)
When :
for negative actuators
speed (opposing load) (4)
for positive actuators
speed (aiding load) (5)
When the train negotiates a curve the pantograph is
commanded to move laterally in one direction to counteract
the carbody tilting in the other direction, that is followed by
a command back to zero when the train exits the curve.
Over this period of time the learning process is activated.
While the pantograph is moving away from center, the
opposing load condition Eq. (2) or Eq. (4) prevails, while
the aiding load condition Eq. (3) or Eq. (5) prevails when
the pantograph travels back to center. Therefore, the
learning algorithm works in the following way.
When the train enters a curve and the pantograph travels
away from center, the algorithm uses Eq. (1) to compute the
value of , which is then used by Eq. (2) or Eq. (4)
to compute the value of based on the value of the
actuator position and on the known design parameters
and . This calculation is performed for predetermined
values of the actuators position . When the train exits the
curve and the pantograph moves back to the centered
position, the same calculations are performed for the same
values of actuators position , but using Eq. (3) or Eq. (5),
thereby determining the values of . Since no
changes of springs preload and frictional losses occur in the
short time interval between entering and leaving a curve, by
knowing and for the same value of it
is possible to find out the values of and .
The computed values of and are stored in memory for
each value of actuator travel and a moving average is then
performed which adapts the values of and to the
variations that can occur in service. However if a sudden
large reduction of spring preload F0 is detected by the
learning process, this would be the result of a broken spring;
an alert is then generated and sent to the decision maker.
The above described learning process occurs only when the
absolute value of the actuation speed is above a minimum
threshold , since very small actuation rates could lead to
less accurate results. The learning process concept block
diagram is shown in Figure 6.
The learning process continues as long as the coherence
checks provide a positive output. If a sensor failure is
recognized or if a difference between the signals of the two
position sensors above the established threshold is
detected, and that difference lasts more than a given time
, then the learning process is discontinued and the
modeling process reverts to the monitoring process
described in the next section.
6. MONITORING PROCESS
The logic for the monitoring mode is described by the block
diagram of Figure 7. The monitoring process performs two
basic functions:
Detects uncommanded movements or lack of response
of the pantograph actuators (Figure 7 – a)
Detects sensors failures that were not identified by the
coherence check (Figure 7 – b)
Detection of uncommanded movements or lack of response
is a relatively straightforward operation: the actuators rate
computed from the time derivative of the position signals is
compared with the rate of change of the position command.
If a discrepancy exists and lasts more than a given amount
of time, a failure is recognized. This monitor is continuously
performed, but in case one of the two position sensors is
-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25-4000
-3000
-2000
-1000
0
1000
2000
3000
4000
Actuator displacement [m]
Spring f
orc
e [
N]
Spring 1
Spring 2
First European Conference of the Prognostics and Health Management Society, 2012
133
European Conference of Prognostics and Health Management Society 2012
8
Figure 6. Concept block diagram of the learning process
failed, the uncommanded movement / lack of response
monitor is temporarily stopped and it is resumed after the
other monitors have identified which of the two position
sensors is the good one. This temporary pause of about 100
ms for the uncommanded movement / lack of response
monitor is instrumental in avoiding a false indication of
wrong system operation. Detection of sensors failure not
identified by the coherence check is a more challenging
task, which is described hereunder.
The actuator speed is computed by performing the time
derivative of the signals and received from the two
position sensors; two values and are then obtained
for the actuator speed. These values are compared with the
actuator speed and computed from the system
model described in the previous section by using the last
values of and determined in the course of the learning
process. The absolute value of the difference between
actual and computed actuator speed is processed by a
filtering element whose purpose is to eliminate undesired
noise in the monitoring process. The filtering element sets
its output equal to only when is greater than a
minimum value . This prevents differences resulting
from the inaccuracies of the modeling process to be counted
as errors. The resulting errors and for the two position
sensors are divided by the actuator speed uMA and uMB in
order to obtain two non-dimensional quantities, eA’ and eB’.
These non-dimensional errors are then integrated with time
and the integrators outputs and are used for
recognizing a sensor failure. If the coherence check
signalled a difference between the two sensors but was
unable to decide which of the two was the failed one a cross
monitoring logic of the monitoring process is able to sort
out the failed sensor. If a sensor is malfunctioning, its
relevant integrator output ( or ) grows faster than the
other, and by looking at which of the two outputs ( or )
is greater, it is possible to sort out which is the failed sensor.
It must be emphasized that for this condition the monitor
does not compare the computed value of a certain quantity
against an acceptable limit and has to decide whether a
failure has occurred or not. The monitor already knows from
the coherence check that a failure exists and simply
compares two quantities ( and ) to realize which of the
two sensors is failed. In this condition, there is an extremely
low probability of error: the quantity relevant to the failed
sensor will definitely be greater than that for the healthy one
and the failed sensor can be positively identified with
practically zero error probability.
YesCompute time
derivativeIf ABS(u) > uT
Identify opposing /
aiding condition1/2
Compute
force
Parameters
identification
Control current
Computed parameters:
- spring preload
- friction force
No
Leave learning process
+
+
xA
xB
Sensors signals
x u
Learning process
First European Conference of the Prognostics and Health Management Society, 2012
134
European Conference of Prognostics and Health Management Society 2012
9
Figure 7. Concept block diagrams of the monitoring process
+
-
Position
signals
consolidation
Position
command
A
B
Rate
computation
Verification of
coherence between
position error and
actuation rate
Position
error
Uncommanded
movementGood
Lack of
response
Actuation
rate
Position signals
health status
a)
Time
derivative
+
-
Model
ABS
eA
|δuA|uMIN
Filter
Integrator
IA= ∫ eA’ dt
+-
ABSWhen
|δx| > δxT
Individual monitor
If IA < IMAX
Cross monitor
Greater of IA and IB shows failed sensor
Yes:
A good
No:
A fail
Lane A monitor
Lane B monitor
B
Fail
B
Good
xA
xB
δx |δx|
uA
actual
Sensors status to decision maker
δuA |δuA|
uA
eA
uM
model
IA
Reset IA = 0
when xA = 0
GoodFail B GoodFail A
b)
x
First European Conference of the Prognostics and Health Management Society, 2012
135
European Conference of Prognostics and Health Management Society 2012
10
When only one sensor is active because the other one was
recognized failed, the monitoring process continues for the
remaining healthy one using the last values of and
determined in the course of the learning process. Obviously,
in this case it is not possible to compare the signals of the
two sensors. Therefore, the monitoring logic relies on
comparing the time integral of the absolute value of the
error resulting from the filtered difference between the
actual and computed actuators speed with a limit
threshold . When the integrator output becomes
greater than a failure is recognized.
Since the monitoring process is meaningful only when the
pantograph is commanded to move, the integrators outputs
( and ) are reset to zero when the pantograph is centered.
This instruction prevents that occasional disturbances, not
related to sensors malfunctionings, are progressively added
by the integrator and possibly generate a false alarm.
Since the monitoring process implemented when only a
single sensor is less accurate than the one for the case of two
sensors active, the limit beyond which a sensor failure
is recognized cannot be set too low to minimize the risk of
false alarms. A comprehensive simulation campaign was
thus performed to establish an optimum value of , such
to obtain the fastest possible recognition of a failure while
minimizing the risk of false alarms.
7. DECISION MAKER
The decision maker consists of a logic routine accepting the
output signals from the coherence check and the monitoring
process to provide the train electronics with the information
of the health status of the pantograph control system.
The decision maker issues the warning of a position sensor
failure ( or ) if such failure has been detected either by
the coherence check or by the monitoring process. In case a
failure of the remaining active sensor is detected after the
other sensor had already failed, an alarm is issued signaling
the loss of pantograph position information.
If the current wrap around performed by the coherence
check detects a failure of the servovalve electrical section, a
warning is issued such that the train electronics can activate
the other servovalve electrical channel.
If a subsequent failure of this other section of the servovalve
occurs, then an alarm is issued indicating loss of pantograph
control.
If an uncommanded movement, or a lack of response is
detected by the monitoring process, the decision maker
issues again an alarm indicating loss of pantograph control.
8. PERFORMANCE ASSESSMENT OF THE HEALTH
MANAGEMENT SYSTEM
The merits of the health management system presented in
this paper were assessed running several simulations of a
model representing the dynamic response of a train
pantograph. In particular, the mathematical model
specifically referred to the Alstom Ferroviaria "Nuovo
Pendolino" train.
In order to assess the merits of the diagnostic system, a
comprehensive complex mathematical model representing
both tilting and pantograph actuation systems was
developed. This model is of a physical based type, based on
the mathematical relationships among the state variables and
the physical parameters. The model proved to be very
accurate when later compared with the data measured
during revenue service operations. Several time histories of
tilt angle commands and actual responses were available,
and the same sequences of commands were injected into the
model and the relevant responses were computed. An
example of comparison is shown in Figure 8, and similar
accuracies were found for all type of tilt commands, and the
validity of the system model was thus positively verified.
This mathematical model acts as a virtual hardware was
then used in place of the actual hardware to verify the
performance of the diagnostic system. Several simulations
were performed, both in normal and in failed conditions, in
order to assess the ability of the health management system
to properly identify a failure of one or both position
transducers and to avoid false alarms. Failures of the
servovalve and of the actuators leading to uncommanded
movements or lack of response were also simulated, but do
not represent a specific advance since the relevant
monitoring logics are normally implemented in hydraulic
servocontrols. The following of this paper is thus focused to
the failure cases of the position sensors. Several simulations
were also performed changing the system physical
parameters in order to check the ability of the learning
process to properly adapt the model parameters to the
varying conditions so as to avoid false failure indications.
Simulations were initially run with the nominal values of
the system parameters to test the health management system
under normal operating conditions. Several pantograph
movements were commanded so as to simulate different
rides, changing both the amplitude and the velocity of the
pantograph movements.
First European Conference of the Prognostics and Health Management Society, 2012
136
European Conference of Prognostics and Health Management Society 2012
11
Figure 8. Example of comparison between virtual hardware
model results and test data
An example of the simulations for operation with nominal
values of the pantograph control system parameters is
shown in Figure 9. Since the commands to the pantograph
actuators are synchronized with the commands to the tilting
actuators, the position commands reported in the y axis of
the upper diagram of Figure 9 are indicated as tilt angle
commands that are comprised from 0° to 8° (maximum tilt
angle). The steady conditions, which are representative of a
travel either in the middle of a curve or in a straight track,
last 5 s. The simulation of Figure 9 was conducted assuming
that both position transducers are initially operating
correctly and that transducer 1 fails at time t = 40 s. In this
case the integral of the error for the remaining active sensor
was computed as defined in section 5 of this paper.
Looking at Figure 9 it can be seen how the error integral
always remains below the fault indication limit and no false
alarm indication is then generated by the monitoring
process.
Simulations were then run changing the values of the
parameters of the pantograph control system, that were
varied in a range can be reasonably expected for the trains in
regular revenue service. The purpose of these simulations
was to test the ability of the learning process to
progressively tune the model values tuning part such to
avoid false failure indications. The physical parameters that
were made vary in order to simulate the whole range of
operating and environmental conditions were:
External load
Spring rate
Spring preload
Friction force
Supply pressure
Figure 9. Health management system assessment: one
sensor active - nominal values of the system parameters
Examples of the health management system performance
are reported in Figure 10 and Figure 11. In particular, Figure
10 refers to the case in which the pantograph is subjected to
a cross wind load of 3000 N, while Figure 11 corresponds to
an operation with a supply pressure reduced from 31.5 to 25
MPa. As for the previous simulation case shown in Figure 9,
it was assumed that a sensor failure occurred at time 40 s in
order to check the ability of the monitoring process to
correctly detect the failure. For the case in Figure 10 it can
be seen that as the cross wind load is applied at time zero, as
long as the sensors are operating correctly the error integral
remains well below the warning threshold, and the value
that is built up at the end of each curve progressively
decreases because the learning process adapts the values of
the system parameters to the changed conditions. No false
alarm is generated and the ability is shown of the learning
process to properly adapt the values of the model
parameters.
For the case in Figure 11, the sudden drop of the supply
pressure from 31 to 25.5 MPa causes the integral of the
error to exceed the threshold for a very small amount of
time for both sensors. Since the two position sensors are
actually operating correctly and passed the coherence check,
the simultaneous overcoming of the threshold for the two
error integrals does not trigger a failure indication. When a
transducer 1failure is actually injected at time t = 40 s, a loss
of coherence between the two sensors is recognized and the
error integral grows much above the threshold, thus
indicating the sensor failure. From that time on the sensor
signal is discarded and the operation continues counting
only on the signal of the other transducer.
After having verified that no undue false alarms were
generated by the health management system for any
combination of environmental and operating conditions of
the pantograph control system, simulations were then run
558 560 562 564 566 568 570 572
0
0.5
1
1.5
2
2.5
Time [s]
Til
tin
g a
ng
les
[°]
Focus on Battipaglia-Reggio Calabria angles
command
calculated position
real position
0 100 200 300 400 500 600-4
-2
0
2
4
Time [s]
Til
tin
g a
ng
les
[°] Battipaglia -Reggio Calabria tilting angles
command
calculated position
real position
0 10 20 30 40 50 60 70 80 90 100
-5
0
5
Time [s]
An
gle
[°]
Reference position
Transducer 1
Transducer 2
0 10 20 30 40 50 60 70 80 90 1000
0.2
0.4
Time [s]
Inte
gra
ted
err
or
Threshold
Integrated error transducer 1
0 10 20 30 40 50 60 70 80 90 1000
0.2
0.4
Time [s]
Inte
gra
ted
err
or
Threshold
Integrated error transducer 2
Transducer 1 failure
First European Conference of the Prognostics and Health Management Society, 2012
137
European Conference of Prognostics and Health Management Society 2012
12
injecting different types of failures, and again this was done
over a wide range of service conditions for the train.
Figure 10. Health management system assessment: one
sensor active - presence of a cross wind load of 3000 N
Failures such as an internal short or a broken wire of one of
the position sensor are immediately picked up by the
coherence check, thus the simulations were focused to those
types of failures that are not easily detected by the
monitoring logic presently implemented in the trains in
service.
Sensors failures addressed by the simulations were:
Step change of the sensor offset
Step change of the sensor gain
Slow change with time of the sensor offset
Slow change with time of the sensor gain
A few typical examples of the simulations results are shown
in Figure 12 through Figure 15. Figure 12 shows the case in
which the sensor 1 offset is subjected to a step change of 6
%, which could be the result of an electrical degradation, or
of a permanent mechanical realignment determined by an
occasional large jerk during the train ride. After the offset
change the output signals of the two sensors are different
and the coherence check will thus alert of a failure. The
ensuing monitoring process then looks at the error integrals
and easily identifies the failed sensor because its error
integral is much greater than that of the healthy sensor. The
same happens for the case of a step change of the gain of
one of the two sensors (Figure 13). When the pantograph is
commanded to move away from center a difference between
the output signals of the two sensors greater than the
threshold is detected by the coherence check, which thus
issues a failure alert. The ensuing comparison between the
error integrals performed by the monitoring process
identifies the failed sensor because its error integral is much
larger than that of the good sensor. Results similar to those
shown in Figure 12 and Figure 13 are obtained for the cases
of a progressive variation of a sensor offset or of a sensor
gain. When the difference between the output signals of the
two sensors is large enough to activate the lack of coherence
alert, the difference between the error integrals computed by
the monitoring process is large and the identification of
which of the two sensors is the failed one can be performed
without error.
Figure 11. Health management system assessment: one
sensor active - system supply pressure reduced from 31.5
MPa to 25 MPa
Progressive variations of one sensor offset and gain were
simulated and are shown in the diagrams of Figure 14 and
Figure 15 to assess which was the maximum error attained
in the pantograph position measurement before the
monitoring process recognizes the sensor failure. The
simulations were performed using a heavy duty track as
pantograph position command sequence. It can be seen from
the simulations results that in both cases the error integrals
tend to increase until they reach a point for which a
pantograph position command greater than a minimum
value makes the error integral to overcome the threshold,
thereby triggering the failure alert. In particular, it can be
observed from Figure 14 how the error integral overcomes
the threshold a few times between approximately 350 s and
550 s before the failure indication is eventually activated.
This is due to the fact that, because of the progressive offset
variation, the transducer indicates an incorrect pantograph
position, but the pantograph position error is not large
enough to activate the lack of coherence check. The
transducer is eventually declared failed at time 550 s, when
the position error of the degrading transducer leads to a
difference from the healthy transducer signal such to signal
a lack of coherence. As this alert is generated, the ensuing
monitor is enabled which recognizes as failed the transducer
with higher error integral. The signal of the failed position
sensor is ignored from then on and no more signals taken
into account in the pantograph position servoloop. For the
0 10 20 30 40 50 60 70 80 90 100
-5
0
5
Time [s]
An
gle
[°]
Reference position
Transducer 1
Transducer 2
0 10 20 30 40 50 60 70 80 90 1000
0.2
0.4
Time [s]
Inte
gra
ted
err
or
Threshold
Integrated error transducer 1
0 10 20 30 40 50 60 70 80 90 1000
0.2
0.4
Time [s]
Inte
gra
ted
err
or
Threshold
Integrated error transducer 2
0 10 20 30 40 50 60 70 80 90 100
-5
0
5
Time [s]
An
gle
[°]
Reference position
Transducer 1
Transducer 2
0 10 20 30 40 50 60 70 80 90 1000
0.2
0.4
Time [s]
Inte
gra
ted
err
or
Threshold
Integrated error transducer 1
0 10 20 30 40 50 60 70 80 90 1000
0.2
0.4
Time [s]
Inte
gra
ted
err
or
Threshold
Integrated error transducer 2
Transducer 1 failure
Transducer 1 failure
First European Conference of the Prognostics and Health Management Society, 2012
138
European Conference of Prognostics and Health Management Society 2012
13
cases of Figure 14 and Figure 15 the failure indication
occurs when the maximum error of the position transducer
is respectively 6 % and 9 %.
Figure 12. Failure simulation scenario #a: The two position
sensors are initially good, then position sensor #2 is
subjected to a step change of its offset
Figure 13. Failure simulation scenario #b: The two position
sensors are initially good, then position sensor #2 is
subjected to a step change of its gain
9. CONCLUSION
The work herein presented was carried out in order to define
a technique able to recognize the failure of the sensors used
to measure the lateral position of the pantograph of high
speed tilting trains equipped with laterally translating
pantographs with minimum risk of missed failures and false
alarms. This would allow an unabated operation of the train
tilting system after a failure of one of the two lateral
position sensors of the pantograph, while the present
monitoring system disables the tilting operation and reduces
the train speed after a single sensor failure.
Figure 14. Failure simulation scenario #c: The two position
sensors are initially good; then sensor #1 undergoes a
progressive variation of its offset
Figure 15. Failure simulation scenario #d: The two position
sensors are initially good; then sensor #1 undergoes a
progressive variation of its gain
The health management system described in this paper was
first tested simulating train rides over different tracks and
for the entire range of operating and environmental
conditions, and appropriate limits for the failure detection
were established to prevent false alarms. Then, all types of
sensors failures and malfunctionings were injected and the
ability of the health management system to recognize them
was positively assessed.
The results of the entire simulation campaign proved the
robustness of the proposed health management system and a
confidence was hence gained in its ability to detect a sensor
failure or malfunctioning with minimum risk of false alarms
or missed failures. The implementation of such health
management system on a tilting train will thus enable the
0 10 20 30 40 50 60 70 80 90 100
-5
0
5
Time [s]
An
gle
[°]
Reference position
Transducer 1
Transducer 2
0 10 20 30 40 50 60 70 80 90 1000
0.2
0.4
Time [s]
Inte
gra
ted
err
or
Threshold
Integrated error transducer 1
0 10 20 30 40 50 60 70 80 90 1000
0.2
0.4
Time [s]
Inte
gra
ted
err
or
Threshold
Integrated error transducer 2
0 10 20 30 40 50 60 70 80 90 100
-5
0
5
Time [s]
An
gle
[°]
Reference position
Transducer 1
Transducer 2
0 10 20 30 40 50 60 70 80 90 1000
0.2
0.4
Time [s]
Inte
gra
ted
err
or
Threshold
Integrated error transducer 1
0 10 20 30 40 50 60 70 80 90 1000
0.2
0.4
Time [s]
Inte
gra
ted
err
or
Threshold
Integrated error transducer 2
0 100 200 300 400 500 600 700 800
-5
0
5
Time [s]
An
gle
[°]
Reference position
0 100 200 300 400 500 600 700 800-2
0
2
Time [s]
An
gle
[°]
Pantograph position error
0 100 200 300 400 500 600 700 8000
0.2
0.4
Time [s]
Inte
gra
ted
err
or
Threshold
Integrated error transducer 1
0 100 200 300 400 500 600 700 800
-5
0
5
Time [s]
An
gle
[°]
Reference position
0 100 200 300 400 500 600 700 800-2
0
2
Time [s]
An
gle
[°]
Pantograph position error
0 100 200 300 400 500 600 700 8000
0.2
0.4
Time [s]
Inte
gra
ted
err
or
Threshold
Integrated error transducer 1
First European Conference of the Prognostics and Health Management Society, 2012
139
European Conference of Prognostics and Health Management Society 2012
14
tilting operation to continue after a failure of a pantograph
lateral position sensor, hence allowing the train to maintain
its high speed travel for the remainder of the ride.
Furthermore, the positive recognition of a sensor failure
would greatly ease the maintenance operation, since the
failed sensor can be replaced without the need of removing
the entire pantograph assembly from the train roof.
ACKNOWLEDGEMENT
The authors wish to thank the tilting trains manufacturer
Alstom Ferroviaria for their support in the preparation of
this paper.
NOMENCLATURE
servovalve current
servovalve flow rate
servovalve deflux coefficient
actuators area
actuators displacement
actuators speed
pressure at servovalve control port 1
pressure at servovalve control port 2
hydraulic system supply pressure
hydraulic system return pressure
springs stiffness
springs preload
friction forces
learning process time threshold
filtered actuator speed error
integral of the actuator speed error
REFERENCES
Boon C.J. & Hayes W.F. (1992). High speed rail tilt train
technology: a state of the art survey. US Department of
Transportation - Federal Railroad Administration.
Jacazio G., Risso D., Sorli M., & Tomassini L. (2010).
Advanced diagnostics of position sensors for the
actuation systems of high-speed tilting trains. Annual
Conference of the Prognostics and Health Management
Society. Paper phmc_10_047.
Elia A. (1998). Fiat Pendolino: developments, experiences
and perspectives. Proc. IMechE, Part F: J. Rail and
Rapid Transit, 212(1), 7–17.
Harris H.R., Schmid E. & Smith R.A. (1998). Introduction:
theory of tilting train behaviour. Proc. IMechE, Part F:
J. Rail and Rapid Transit, 212(1), 1–5.
Borello L., Dalla Vedova M., Jacazio G. & Sorli M. (2009)
A prognostic model for electrohydraulic servovalves.
Annual Conference of the Prognostics and Health
Management Society, Paper phmc_09_66.1.
Byington C.S., Watson M., Edwards D. & Stoelting P.
(2004). A model-based approach to prognostics and
health management for flight control actuators, IEEE
Aerospace Conf. Proc. (Vol. 6, pp. 3551-3562).
BIOGRAPHIES
G. Jacazio is professor of applied mechanics and of
mechanical control systems. His main research activity is in
the area of aerospace control and actuation systems and of
prognostics and health management. He is a member of the
SAE A-6 Committee on Aerospace Actuation Control and
Fluid Power Systems, and a member of the international
society of prognostics and health management.
M. Sorli is professor of applied mechanics and of
mechatronics. His research interests are in the areas of
mechatronics, mechanical and fluid servosystems, spatial
moving simulators, smart systems for automotive and
aerospace applications. He is member of the TC
Mechatronics of IFToMM, ASME and IEEE.
D. Bolognese is a research engineer. His research interests
are in the area of simulations of mechanical and fluid
systems
D. Ferrara is a PhD student in Mechanical Engineering.
His research interests are in the areas of aerospace actuation
and control systems and of prognostics and health
management.
First European Conference of the Prognostics and Health Management Society, 2012
140
Lifetime models for remaining useful life estimation with randomly
distributed failure thresholds
Bent Helge Nystad1,2
, Giulio Gola1,2
and John Einar Hulsund1
1Institute for Energy Technology, Halden, Norway
2IO Center for Integrated Operations, Trondheim, Norway
[email protected], [email protected], [email protected]
ABSTRACT
In order to predict in advance and with the smallest possible
uncertainty when a component needs to be fixed or
replaced, lifetime models are developed based on the
information of the component deterioration trend and its
failure threshold to estimate the stochastic distribution of the
hitting time (the first time the deterioration exceeds the
failure threshold) and the remaining useful life. A primary
issue is how to effectively handle the uncertainties related to
the component deterioration trend and failure threshold.
This problem is here investigated considering a non-
stationary gamma process to model the component
deterioration and a gamma-distributed failure threshold.
Two lifetime models are proposed for comparison on an
application concerning deterioration of choke valves used in
offshore oil platforms.
1. INTRODUCTION
The capability of predicting when maintenance actions are
required is a primary issue for every industry and bears the
advantages of enhancing operational safety and maximizing
plant reliability. In this respect, to estimate in advance and
with an acceptable level of uncertainty the component
remaining useful life, one can either define a failure time
probability based on the failure times records of a large
number of similar components, or exploit the information
on the component deterioration trend during operation
(Nystad, 2008; Gola & Nystad, 2011a). The latter approach
is less conservative and allows tailoring maintenance
planning to the specific case and, as a consequence,
maximizing the usage of the component.
In practice, lifetime estimation models (van Noortwijk,
2009; Lu & Meeker, 1993) are devised to combine the
knowledge of the past deterioration trend and the current
degradation state with the failure threshold and to estimate
the hitting time (Abdel-Hameed, 1975; Frenk & Nicolai,
2007) and the remaining useful life (van Noortwijk, 2009;
Rausand & Høyland, 2004).
The uncertainty associated to the deterioration trend is here
modelled by a non-stationary gamma process (Gola &
Nystad, 2011a; van Noortwijk, 2009). A gamma process is a
stochastic process with independent, non-negative gamma-
distributed increments and represents a valuable option to
model monotonic processes, i.e. with gradual damage
monotonically accumulating over time in a sequence of
increments such as wear, fatigue, erosion/corrosion, crack
growth, erosion, creep and swell.
The specification of the failure threshold is a critical issue
(Nystad, 2008; van Noortwijk, 2009). In fact, using a
deterministic threshold is problematic since the same
component can fail at different degradation levels.
Typically, an unbiased estimate of the threshold mean value,
or a conservative lower-bound threshold estimate are
supplied. Nevertheless, if the threshold value is set too high,
the risk of actual component failure will increase. On the
contrary, a conservative low threshold value reduces the risk
of failure, but increases the failure probability to a point in
which the component can be prematurely put off operation.
For some applications, e.g. cable aging due to thermal and
mechanic damage (Fantoni & Nordlund, 2009), the designer
may not know with certainty what explicit level of
degradation causes a failure. If threshold failure data are
scarce an alternative source of information are engineers
with expertise within the relevant field. Such experts can
provide useful information about the threshold probability
distribution in form of best estimates of percentiles.
This problem is here tackled by considering the threshold as
a random variable with a gamma probability distribution. A
likelihood function can then be established based on the
expert judgment in terms of percentiles (Welte & Eggen,
2008).
_____________________
Bent H. Nystad et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License,
which permits unrestricted use, distribution, and reproduction in any
medium, provided the original author and source are credited.
First European Conference of the Prognostics and Health Management Society, 2012
141
European Conference of Prognostics and Health Management Society 2012
2
A practical application concerning erosion in choke valves
used in the oil and gas industry is considered (Gola &
Nystad, 2011a; Bringedal, Hovda, Ujang, With &
Kjørrefjord, 2010) and two lifetime models for estimating
the remaining useful life are proposed and compared.
2. THE HITTING TIME AND REMAINING USEFUL LIFE
Since the failure threshold variability does not depend on
the temporal uncertainty associated to the deterioration trend
but only on the historical failure records of the component,
it is reasonable to assume that the threshold distribution is
independent from the deterioration distribution (Abdel-
Hameed, 1975).
In this view, the cumulative density function of the hitting
time is defined in Abdel-Hameed (1975) and can be written
for each time 0t as:
( )
0 0
( )
0
( ) Pr ( )
( ) ( )
( ) ( )
Y
x
X t Y
x y
Y X t
x
H t X t Y
f x f y dydx
F x f x dx
(1)
where ( )X tf is the probability density function (pdf) of the
deterioration trend ( ) 0X t , Yf is the pdf and YF is the
cumulative density function (cdf) of the failure threshold
0Y (satisfying (0) 0YF ).
The meaning of Eq. (1) is illustrated in Figure 1 using the
choke valve case study data (see Section 3). Based on the
erosion data for the operational time interval 0,280t
(diamonds in Figure 1), the expected value (solid line) and
5th
and 95th
percentiles (dashed lines) of the fitted gamma
process with assumed power law shape are shown. Notice
that the functional shape of the erosion process at timestamp
280t is convex. The failure threshold is here defined as a
gamma distribution, i.e. the hazard zone (red contour plot in
the Figure). The probability of failure in the operational
time is illustrated by the hitting time pdf (blue line).
Figure 1. The hitting time probability density function (blue
line); fitted gamma process with power-law shape (black
solid and dashed lines) and a gamma distributed hazard zone
(red).
The remaining useful life at time 0t s is derived from
Eq. (1) (Rausand & Høyland, 2004) and is here calculated
by resorting to a state-based approach (Gola & Nystad,
2011b) which accounts for the knowledge of the
deterioration state sx at time t s :
( ) ( )( )
1s
Y Y sX t X s s
Y sx
F x F xRUL t f x x dx
F x
(2)
Recalling that in a time-based perspective the deterioration
x is a function of t , the pdf sX t X sf x x represents
the probability of having at time t a deterioration increment
sx x and the term 1Y Y s Y sF x F x F x is the
left-truncated cdf of the failure threshold providing the
probability of having the failure threshold y between the
current deterioration state sx and infinity.
The meaning of Eq. (2) is illustrated in Figure 2. The fitted
gamma process is the same with the exception that here
there is no uncertainty in the erosion in the operational time
interval 0,280t . The expected value (solid line)
remains unchanged; the 5th
and 95th
percentiles (dashed
lines) are instead calculated based on the erosion increment
sx x . The left-truncated failure threshold (red contour plot)
and the pdf of the RUL (blue line) are finally shown.
Hazard zone
First European Conference of the Prognostics and Health Management Society, 2012
142
European Conference of Prognostics and Health Management Society 2012
3
Figure 2. RUL probability density function (blue line); fitted
gamma process with power-law shape (black solid and
dashed lines); left-truncated gamma distributed hazard zone
(red).
Nevertheless, for a distribution without memory (as e.g. the
exponential distribution) there is no advantage in left-
truncating the cdf of the threshold and therefore the
expression of the remaining useful life becomes the same as
the left-truncated version of the hitting time.
Notice that since the hitting time model in Eq. (1) considers
uncertainty in the whole deterioration trend from 0t to
infinite, the associated uncertainty calculated at 0t s is
higher than that of the pdf depending only on the prediction
from t s to infinity.
2.1. The deterioration model
The deterioration ( )X t is here modelled as a non-stationary
gamma process (van Noortwijk, 2009) with a time-
dependent pdf written as:
( )( ) 1
( ) ( )( )
v tv t ux
X t
uf x x e
v t
(3)
where ( ) 1
0
( ) v t zv t z e dz
is the gamma function with
shape parameter ( ) 0v t and scale parameter 0u ,
(0) 0X with probability one, the deterioration increment
( ) ( )X t X s gamma-distributed with shape parameter
( ) ( )v t v s and scale parameter u for any 0t s and the
stochastic process ( ), 0X t t having independent
increments. The shape function ( )v t must be non-
decreasing, right-continuous and real-valued for 0t , with
(0) 0v and ( )v . When ( )v t is linear the gamma
process is stationary and it is non-stationary when ( )v t is
non-linear.
2.2. The threshold model
Indeed, the hitting time (Eq. 1) and remaining useful life
(Eq. 2) models are well suited to handle different types of
uncertainties of the failure threshold related for example to
the estimate of the initial deterioration (due to imperfect
maintenance or production defects), to the manufacturing
variability and to the historical measurements.
In this paper, a gamma-distributed failure threshold
( , )Y Ga y with shape parameter 0 and scale
parameter 0 is considered, with pdf and cdf given for
any 0y as:
1( )( )
( , )( ) , 0,
( )
yY
Y
f y y e
yF y y
(4)
where 1
0
( , )
y
zy z e dz
is the lower-incomplete
gamma function. Notice that the shape parameter is in
this case a time-independent constant.
2.3. Expected deteriorations
In general, the expected deterioration ( ( ))E X t can be
linear, concave, convex or any combination of these. As
discussed in van Noortwijk (2009), the power law function
is a flexible candidate for linear, concave and convex
deterioration.
( ) bv t ct
E X tu u
(5)
In this case, the gamma process is linear and stationary if
1b , non-stationary concave and convex if 1b and
1b , respectively.
However, the process in Eq. (5) cannot describe a
deterioration trend both concave and convex. Given the
restrictions on ( )v t , a candidate process, which describes
the expected degradation that is first concave and then
convex (i.e., z-shaped) is:
Hazard zone
s
xs
First European Conference of the Prognostics and Health Management Society, 2012
143
European Conference of Prognostics and Health Management Society 2012
4
sinh sinhc ab a t b
E X tu
(6)
where the shape parameter 0b is the timestamp of
inflection and 0a is related to the size of the derivative in
the inflection point.
An example of an expected deterioration as in Eq. (6) is the
impact of external stress on materials/devices (Mc Pehrson,
2010). The net reaction rate for material/device degradation
becomes concave (linear) with low stress and convex with
high stress.
2.4. Inference of the model parameters
In practice, the application of the gamma process requires
using statistical methods for estimating the parameters from
the available measurements. For the gamma process a
typical data set consists of inspection times it , 1, ,i n ,
where 0 1 20 nt t t t , and the corresponding
observations of cumulative amounts of deterioration ix ,
1, ,i n , where 0 1 20 nx x x x . The
estimators for the scale parameters u and c for the power
law (Eq. 5) and z-shaped (Eq. 6) degradations can be
derived by the method of moments or the method of
maximum likelihood (van Noortwijk, 2009). The method of
moments leads to attractive and simple formulae for the
parameters, but it requires knowledge of the shape
parameters values of the power law b and z-shaped
,a b degradations which are either given based on
experts’ opinion (Welte & Eggen, 2008) or can be inferred
numerically by least square optimization. On the other hand,
the method of maximum likelihood, explained in van
Noortwijk (2009), allows estimating directly the shape and
scale parameters, at the expenses of larger computational
costs. In the application that follows, first least square
optimization is used to determine the shape parameters and
then the method of moments is applied to calculate the scale
parameters ,u c .
Concerning the failure threshold, historical values for a
number of similar components can be used to calculate the
mean and standard deviation of the failure threshold
distribution (Nystad, 2008). For highly reliable components
for which failures are rare, one can use few deterioration
samples with their associated parameters and Monte Carlo
simulations to generate a large number of deterioration
paths. Different threshold values randomly selected from a
threshold distribution can then be used to estimate the
hitting time (Lu & Meeker, 1993). Finally, a source of
information is constituted by field experts (Welte & Eggen,
2008). Since the meaning of many probability distribution
parameters is rather abstract, experts have usually problems
estimating them directly. In fact, experts can provide useful
information about the threshold distribution in terms of best
estimates (mean, median, mode) or percentiles (e.g. a 10th
percentile corresponding to early failures) which can be
used to estimate the parameters of two-parametric
probability distributions like the gamma distribution.
3. DETERIORATION OF CHOKE VALVES
The application proposed in this paper concerns
deterioration of choke valves undergoing erosion (Bringedal
et al., 2010; Andrews, Kjørholt & Jøranson, 2005). In
offshore oil platforms, choke valves are used on the surface
to control the flow of hydrocarbons and protect the
equipment from unusual pressure fluctuations. Production
experience has shown that choke valves are prone to sand
erosion in the disks and in the outlet sleeve (Andrews et al.,
2005). The main parameters determining erosion are the
impact velocity and the impact angle of the sand grains
through the choke discs.
Figure 3. Damage caused by sand erosion. In the picture the
original circular holes in the disks have a major wear on the
upper side on the left hole and lower side on the right hole.
From the mathematical point of view, the flow characteristic
VC is defined so that the pressure differential p across the
choke valve is constant and total mass flow rate w through
the valve is proportional to the valve flow coefficient VC
which is related to the effective flow cross-section of the
valve and therefore depends on the valve opening.
V
pw C
(7)
where is the average mixture density. The VC curve is
specific to the valve type and size and for a given valve
opening VC is expected to be constant (Kirmanen, Niemelä,
Pyötsiä, Simula, Hauhia & Riihilahti, 2005). The VC
characteristic curve is the baseline for a good as new valve
and is often provided by the valve constructors. When
erosion occurs, a gradual increase of the effective flow
cross-section is observed even at constant pressure drop.
Such phenomenon is therefore related to an abnormal
increase of the valve flow coefficient with respect to its
expected baseline value, hereby denoted as b
VC . For this
reason, for a given valve opening the difference VC
First European Conference of the Prognostics and Health Management Society, 2012
144
European Conference of Prognostics and Health Management Society 2012
5
between the actual flow coefficient and its baseline is
retained as an indicator of the valve erosion. The difference
V
b
C V VC C is expected to be monotonically increasing
throughout the life of the valve, thus reflecting the physical
behavior of the erosion process. When VC eventually
reaches an established erosion threshold, the valve must be
replaced (Gola & Nystad, 2011a)
The valve flow coefficient VC in a multiphase environment
cannot be directly measured, but it can be calculated from
the following analytical expression which accounts for the
physical parameters involved in the process:
2
6
o w g go wV
o w gp
w w w ff fC
JN F p
(8)
where ow , ww and gw are the flow rates of oil, water and
gas, of ,
wf and gf the corresponding fractions with respect
to the total flow rate and o ,
w and g the corresponding
densities. J is the gas expansion factor, pF is the piping
geometry factor and 6N is a constant equal to 27.3
(Kirmanen, Niemelä, Pyötsiä, Simula, Hauhia & Riihilahti,
2005). The quality of the available data of the physical
parameters in Eq. (8) differs because p is directly
measured, whereas oil, water and gas flow rates are
calculated based on daily production rates of other wells of
the same field. Improvement of the valve erosion indicator
VC based on additional information from well tests carried
out throughout the valve life is discussed in Gola and
Nystad (2011a). Therefore, in this paper, a single choke
valve undergoing erosion is considered and hitting time
models and new RUL models based on Eq. (2) are applied
to the VC trend obtained in Gola and Nystad (2011a) as a
function of the operational days. The valve was opened and
checked to be found in a failed state at operational time
307nt days.
Expert judgment is here used to define the failure threshold
probability distribution (Welte & Eggen, 2008). For a
gamma-distributed threshold, the experts provide the best
central estimate which, in this case study, is equal to the
mean value of the threshold set by the experts ( 16y ) and
they are also asked to assess the boundaries of the interval in
which the true value of the threshold falls. A measure of the
uncertainty of the expert opinion is the standard deviation of
Y . By having the expert claiming that e.g. the true threshold
lays between the values 14 and 18 being most likely equal to
16, one can calculate the shape parameter and the scale
parameter of Eq. (4) from ( ) 16E Y and
2( ) 2Y Y which yields 64 and 4 .
This hazard zone distribution is shown as a red contour plot
in Figure 1. The skewness of a gamma distribution is
2 0.25 , a value which implies a good fit to the
expert’s claim.
Figures 4 and 5 show the VC trend and its estimation
provided by the power-law (Eq. 5) and z-shaped (Eq. 6)
models obtained at different operational days, namely nt
100, 200, 250 and 307. Because cumulative amounts of
deterioration are measured, the last inspection contains the
most information. For the gamma process the expected
deterioration at the last inspection time (at time nt ) equals
nx ; that is, n nE X t x (van Noortwijk, 2009). Figures
6 and 7 illustrate the remaining useful life and the associated
uncertainty (5th
and 95th
percentiles) obtained for each
239nt when using the power-law and z-shaped models,
respectively.
Notice that the power-law shape becomes convex only after
250 operational days (Fig. 4), thus leading to an
overestimation of the component remaining useful life (Fig.
6) with respect to its theoretical value (red dashed line). On
the other hand, using the z-shaped model at 240 operational
days one can already identify the final z-shape of the
degradation (Fig. 5), with the consequence of obtaining
better estimations of the remaining useful life, i.e. closer to
its theoretical value and with a reduced uncertainty (Fig. 7).
Figure 4. VC trend (thick line) and corresponding
estimations provided by the power-law model.
First European Conference of the Prognostics and Health Management Society, 2012
145
European Conference of Prognostics and Health Management Society 2012
6
Figure 5. VC trend (thick line) and corresponding
estimations provided by the z-shaped model.
Figure 6. Remaining useful life estimation with the power-
law model (95% confident interval) and theoretical value
(red dashed line).
Figure 7. Remaining useful life estimation with the z-shaped
model (95% confident interval) and theoretical value (red
dashed line).
4. CONCLUSION
This paper has investigated the problem of estimating the
remaining useful life of components using stochastic
lifetime models and considering randomly-distributed
failure thresholds. In particular, gamma processes with
power-law and z-shaped shape functions (i.e. first concave,
then convex) have been proposed to predict the
deterioration; a gamma distribution has been considered to
model the failure threshold for it is frequently used as a
probability model in life testing and it is a flexible
distribution for modeling the uncertainty in experts’
opinions. The failure threshold distribution is also known to
contain only positive real values, i.e. 0,x .
A case study of erosion of choke valves used in offshore oil
platforms has been considered and the results of the
expected deterioration calculation and the remaining useful
life estimation given by the power-law and z-shaped models
have been compared. An a priori knowledge of the overall
shape of the deterioration is valuable. In this respect, with
some efforts the shape of the expected erosion can be
assumed beforehand by some engineering expertise.
However, the model is general and can be applied also to
other cases where the distribution of the parameters for a
maintenance model must be estimated.
REFERENCES
Abdel-Hameed, M. (1975). A gamma wear process. IEEE
Transactions on Reliability, 24(2), 152-153.
Andrews, J., Kjørholt, H., & Jøranson, H. (2005).
Production enhancement from sand management
philosophy: a case study from Statfjord and Gullfaks
(SPE 94511). SPE European Formation Damage
Conference, May 25-27, Sheveningen, The
Netherlands.
Bringedal, B., Hovda, K., Ujang, P., With, H.M., &
Kjørrefjord, G. (2010). Using online dynamic virtual
flow metering and sand erosion monitoring for integrity
management and production optimization. Deep
Offshore Technology Conference, May 3-6, Huston,
Texas.
Fantoni, P.F., & Nordlund, A. (2009). Wire system aging
assessment and condition monitoring (WASCO). NKS-
130. ISBN 87-7893-192-4
Frenk, J.B.G., & Nicolai, R.P. (2007). Approximating the
randomized hitting time distribution of a non-stationary
gamma process. Rotterdam: Econometric Institute and
Erasmus Research Institute of Management.
Gola, G., & Nystad, B.H. (2011a). From measurement
collection to remaining useful life estimation: defining
a diagnostic-prognostic frame for optimal maintenance
scheduling of choke valves undergoing erosion. Annual
Conference of the Prognostics and Health Management
Society, September 26-29, Montreal, Canada.
Gola, G., & Nystad, B.H. (2011b). Comparison of time- and
state-space non-stationary gamma processes for
estimating the remaining useful life of choke valves
undergoing erosion. 24th
International COMADEM
Conference, May 30 - June 1, Stavanger, Norway.
Kirmanen, J., Niemelä, I., Pyötsiä, J., Simula, M., Hauhia,
M., & Riihilahti, J. (2005). Flow control manual.
Helsinki: Metso Automation.
First European Conference of the Prognostics and Health Management Society, 2012
146
European Conference of Prognostics and Health Management Society 2012
7
Lu, J.C., & Meeker, W.Q. (1993). Using degradation
measures to estimate a time-to-failure distribution.
Technometrics, 35(2), 161-173.
Mc Pehrson, J.W. (2010). Reliability physics and
engineering. London: Springer.
Nystad, B.H. (2008). Technical condition indexes and
remaining useful life of aggregated systems. Doctoral
dissertation. Norwegian University of Science and
Technology (NTNU), Trondheim, Norway. ISBN: 978-
82-471-1256-4
Rausand, M., & Høyland, A. (2004). System reliability
theory. Models, statistical methods, and applications.
New Jersey: Wiley & Sons.
van Noortwijk, J.M. (2009). A survey of the application of
gamma processes in maintenance. Reliability
Engineering and System Safety, 94(1), 2-21.
Welte, T.M., & Eggen, A.O. (2008). Estimation of sojourn
time distribution parameters based on expert opinion
and condition monitoring data. International
Conference on Probability Methods Applied to Power
Systems, May 25-29, Rincòn, Puerto Rico.
BIOGRAPHIES
Bent H. Nystad MSc in Cybernetics, RWTH Aachen,
Germany, and PhD in Marine Technology, NTNU
Trondheim, Norway. He has work experience as a condition
monitoring expert from Raufoss ASA (Norwegian missile
and ammunition producer) and is Principal Research
Scientist at the Institute for Energy Technology (IFE)
OECD Halden Reactor Project (HRP) since 1998. His
research interests have ranged from data-driven algorithms
and first principle models for prognostics, algorithm
performance evaluation, requirement specification, technical
health assessment and control applications.
Giulio Gola MSc in Nuclear Engineering, PhD in Nuclear
Engineering, Polytechnic of Milan, Italy. He is currently
working as a Research Scientist at the Institute for Energy
Technology (IFE) and OECD Halden Reactor Project (HRP)
within the Computerized Operations and Support Systems
department. His research topics deal with the development
of qualitative models and artificial intelligence-based
methods for on-line, large-scale signal validation, condition
monitoring and instrument calibration, system diagnostics
and prognostics.
John E. Hulsund is a graduate in Experimental Particle
Physics from the University of Bergen, Norway. He has
been working as a Research Scientist at the Institute for
Energy Technology (IFE) and OECD Halden Reactor
Project (HRP) since 1997 within the Computerized
Operations and Support Systems department. His main area
of work has been the development of a computerized
procedure tool for control room operations.
First European Conference of the Prognostics and Health Management Society, 2012
147
Major Challenges in Prognostics: Study on Benchmarking
Prognostics Datasets
O. F. Eker1, F. Camci
1, and I. K. Jennions
1
1 IVHM Centre, Cranfield University, UK
ABSTRACT
Even though prognostics has been defined to be one of the
most difficult tasks in Condition Based Maintenance
(CBM), many studies have reported promising results in
recent years. The nature of the prognostics problem is
different from diagnostics with its own challenges. There
exist two major approaches to prognostics: data-driven and
physics-based models. This paper aims to present the major
challenges in both of these approaches by examining a
number of published datasets for their suitability for
analysis. Data-driven methods require sufficient samples
that were run until failure whereas physics-based methods
need physics of failure progression.
1. INTRODUCTION
Condition based maintenance (CBM) is a preventive
maintenance strategy, in which maintenance tasks are
performed when need arises. The need is determined by
tracking the health status of the system or component
(Camci and Chinnam, 2010: Eker et al., 2011). CBM is a
proactive process involving two major tasks: diagnostics
and prognostics. Diagnostics is the process of identification
of faults, whereas prognostics is the process of forecasting
the time to failure. Time left before observing a failure is
described as remaining useful life (RUL) also called
remaining service or residual life (Jardine et al., 2006).
An example of degradation in health level of an asset is
shown in Figure 1. The P-F interval is the time interval
between potential failure which is identified by health
indicators, and an eventual functional failure. With CBM
it’s necessary that the P-F interval is long enough to enable
corrective maintenance action to be taken (Jennions, 2011).
Figure 1. P-F curve of an asset
Diagnostics is a more mature field than prognostics. Once
degradation is detected, unscheduled maintenance should be
performed to prevent the failure consequences. It is not
uncommon to spend more time in maintenance preparation
than in performing the actual maintenance due to lack of
resources. In prognostics on the other hand, maintenance
preparation could be performed when the system is up and
running, since the time to failure is known early enough.
Thus, the actual maintenance duration becomes the major
contributor of the downtime. Figure 2 illustrates the
comparison of diagnostics and prognostics.
Performing maintenance preparation when the system is up
and running has a great effect on reducing the operation and
support costs. In addition to the reduced down time, the
inventory cost will be reduced since more time will be
available for obtaining required parts. The efficiency in
logistics & supply chain will be increased due to the better
preparation for maintenance. The life cycle cost of the
equipment will be reduced, since they are used until end of
their lives.
_____________________
Omer Faruk Eker et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License,
which permits unrestricted use, distribution, and reproduction in any
medium, provided the original author and source are credited.
First European Conference of the Prognostics and Health Management Society, 2012
148
European Conference of Prognostics and Health Management Society 2012
2
Figure 2. Comparison of failure diagnostics and prognostics
maintenance scenarios
Despite the potential value in prognostics, it is considered to
be one of the most challenging tasks in CBM (Zhang et al.,
2006: Peng et al., 2010). Prognostics involves two phases as
shown in Figure 3. The first phase of prognostics aims to
assess the current health status. Severity detection, health
assessment, or degradation detection are the terms used for
describing this phase in the literature. This phase could also
be considered under diagnostics. Pattern recognition
techniques such as classification or clustering can be
utilized in this phase. The second phase aims to predict the
failure time by forecasting the degradation trend and by
identifying remaining useful life (RUL). Time series
analysis, trending, projection or tracking techniques are used
for this phase.
Figure 3. Phases of prognostics and diagnostics
Many academic papers with prognostics titles only consider
the first phase (Qiu et al., 2003; Ocak et al., 2007).
However, prognostics without the second phase will not be
complete and will not lead to RUL estimation. This paper
focuses on the second phase of prognostics.
Prognostics methods can be analyzed in two major
categories: Data-driven and physics-based models. Data-
driven models utilize past condition monitoring data, current
health status of the system, and degradation of similar
systems. Physics-based models employ system specific
mechanistic knowledge, defect growth formulas, and
condition monitoring data to predict the RUL of systems
(Heng et al., 2009).
This paper aims to discuss the challenges for data-driven
and physics-based prognostics and presents several case
studies. Section 2 reports the requirement analysis and
challenges of data-driven and physics-based prognostics
models. Section 3 discusses several prognostic case studies.
Finally section 4 concludes the paper with an emphasis on
future research tasks.
2. CHALLENGES IN PROGNOSTICS MODELING
Both data-driven and physics-based models have different
requirements to model the degradation and predict the RUL
of a system. Challenges and requirements of both
approaches are given in distinct sub-sections below.
2.1. Data-Driven Models
Data driven models intend to model system behavior using
regularly collected condition monitoring data instead of
using comprehensive system physics or human expertise
(Heng et al., 2009). Data-driven approaches are classified
into two categories in general. These are statistical and
machine learning approaches. Statistical approaches
construct models by fitting a probabilistic model to the
available data. Machine learning approaches attempt to
recognize complex patterns and make intelligent decisions
based on empirical data.
Both statistical and machine learning methods use the
degradation patterns of sufficient samples representing
equipment failure progression. This requirement is the
major challenge in data-driven prognostics since it is often
not possible to obtain samples of failure progressions.
Industrial systems are not allowed to run until failure due to
its consequences especially for critical systems and failure
modes. However quality and quantity (sample size) of
system monitoring data has a high influence on data-driven
methods. Sample sizes of prognostic datasets in the
literature range from 10 to 40 (Camci and Chinnam, 2010;
Baruah and Chinnam, 2005; Huang et al., 2007; Gebraeel et
al., 2005; Eker et al., 2011). In this paper datasets will be
compared to sample sizes provided in the references above
as quantitative analysis.
Most of the electro-mechanical failures occur slowly and
follows a degradation path (Gebraeel et al., 2009). Failure
degradation of such a system might take months or even
years. This challenge has been addressed in the literature in
the following ways:
1. Accelerated aging: Equipment is run in a lab environment
with extreme loads and/or increased speed to allow faster
First European Conference of the Prognostics and Health Management Society, 2012
149
European Conference of Prognostics and Health Management Society 2012
3
failure. Structural health monitoring applications are a good
example of this type of failure progression. Test specimens
are subjected to cyclic loading experiments so that cracks
are propagated faster than normal degradation process
(Camci et al., 2012: Diamanti & Soutis, 2010: Papazian et
al., 2009). Camci and Chinnam, (2010) used imitations of
real components which are made by vulnerable materials so
that failure progresses faster than normal.
2. Unnatural failure progression: A predefined degradation
formula is used to define the discrete failure states and
duration to be spent in each state. Failure progression in a
railway turnout has been modeled using exponential
degradation (Eker O. F., 2011).
Each solution has its own strengths and weaknesses with
some level of failure degradation representation capability.
2.2. Physics-Based Models
Physics-based models employ a physical understanding of
the system in order to estimate the remaining useful life of
an asset. Even though samples of failure degradation are not
essential in physics based prognostics, the physical rules
within the system should be known in detail. The first phase
in physics based prognostics is to employ residuals that
represent the dispersion of sensed measurements from their
expected values of healthy systems (Namburu et al., 2003).
The second phase in physics based prognostics requires
mathematical modeling of failure degradation.
There exist two major challenges in physics based
prognostics: 1) the lack of sufficient knowledge on physics
of failure degradation and 2) the inability to obtain the
values of the parameters in the formulations. Thus,
sufficient component/system information and good
understanding of failure mechanisms are essential and
skilled personnel is also required in physics based models
(Zhang et al., 2009). Environmental and operating
conditions might be used as inputs and constitute added
dimensions to be considered.
3. BENCHMARKING DATASETS
Several publicly available datasets are analyzed in this
section for their suitability in testing prognostic approaches.
As mentioned in section 2, a prognostic dataset is expected
to have a minimum sample size around 10 in order to
perform data-driven modeling effectively. Regarding
physics-based prognostics side, datasets will be examined
with regards to: 1) If a mathematical degradation model
exists for the specific application and 2) whether parameters
in the model are provided with datasets or not. The
applicability of data driven and physics based prognostics
methods have been studied and results are presented in
following subsections.
3.1. NASA Data Repository (5 dataset)
NASA Ames prognostics data repository (2012) is a
growing source covering several sets of prognostic data
contributed by universities, companies, or agencies.
Datasets in the repository consist of run-to-failure time
series data representing the case study under examination.
There are seven sets of prognostics dataset available. In this
section analysis of five datasets for data-driven or physics-
based modeling is presented.
3.1.1. Milling Dataset
Sixteen milling inserts were degraded by running them at
different operating conditions (Agogino and Goebel, 2007).
Once the flank wear on the milling insert exceeded a
standard threshold level the tool was considered to have
failed. Flank wear was observed by a microscope on the
flank face of the cutting tool caused by the abrasion of hard
constituents of workpiece material which is commonly
observed during the machining of steels or cast irons.
Measurements of acoustic emission, vibration and current
were collected as indirect health indicators. There are eight
different operating conditions leading to only two samples
for each operating condition.
Effective data-driven modeling is very difficult, if not
impossible, using only two samples of failure degradation.
Several tool life or tool-wear rate models, mostly based on
Taylor’s formula (Yen et al., 2004), have been selected for
physics based prognostics and are displayed in Table 1.
Tool Life Models Tool Wear Rate Models
(1)
(2)
(3)
(4)
(5)
Table 1. Tool life and wear models
First European Conference of the Prognostics and Health Management Society, 2012
150
European Conference of Prognostics and Health Management Society 2012
4
In physics-based prognostics side, Taylor tool life (Eq. 1)
and its extended versions in Eqs. 2-3, are well known life
models employed in machining applications. Each of them
can be applied into tool degradation scenarios separately.
Tool life is the duration in which a tool can be operated
properly before it starts to fail. In machining applications a
predetermined flank wear upper level is used as a failure
criterion. Tool life and rate of wear are sensitive to changes
in cutting conditions. The relationship between tool life and
machining parameters (e.g. cutting speed, feed, and depth of
cut) are described by these equations. Cutting speed is
considered as the difference in speed between the cutting
tool and the workpiece. Feed rate is the velocity of a tool
moving laterally across the workpiece which is
perpendicular to the cutting speed. The depth of cut is how
deep a workpiece is penetrated. Takeyama and Murata’s
tool wear rate model, shown in Eq. 5, describes the
relationship between rate of volume loss on the tool insert,
cutting distance and diffusive wear per cycle. Even though
parameters specific to tool material or workpiece (e.g.
cutting tool hardness) can be found in machining tool
handbooks, operating or environmental condition
parameters such as cutting temperature and sliding speed are
not provided with the dataset.
For the above reasons this dataset is found to be not suitable
for data-driven and physics-based prognostic models.
3.1.2. Bearing Dataset
Three sets (each set consist of four bearings) of tapered
rolling element bearings have been run to failure at the same
operating conditions (Lee et al., 2007). Accumulated mass
of debris was collected for each experiment, the amount of
debris being considered a direct health indicator of the
bearing health (Dempsey et al. 2006). In contrast to the
milling dataset, the direct health indicator (amount of debris
collected) was not provided with the dataset. Vibration data
was collected regularly as an indirect health indicator. After
exceeding 100 million revolutions the bearings were failed
due to a crack or outer race failure (Qiu et al., 2006).
Yu-Harris (Y-H) and Kotzalas-Harris (K-H) models were
selected to be used in a physics-based prognostic approach.
Both bearing spall initiation and spall progression models
found in (Orsagh et al., 2003: Yu, and Harris, 2001) are
shown in Table 2. Yu and Harris’ bearing stress-based spall
initiation formula is a function of dynamic capacity ( ) and
the applied load ( ) as shown in Eq. 6. Dynamic capacity is
also a function of bearing geometry and stress. Once
initiated, a spall grows very quickly and a bearing has only
3% to 20% of its remaining useful life left (Kotzala and
Harris, 2001). The Kotzala-Harris spall progression rate
model is a function of spall progression region width ( ),
and is described with regards to maximum stress ,
average shearing stress , and spall length .
Similar to the previous dataset some parameters to be used
in physics based modeling are not found in the dataset
(e.g. ).
Challenges emerge in this dataset are:
Three run-to-failure sets of samples are considered
insufficient for data-driven modeling when
compared to dataset sample sizes found in
literature.
Lack of parameters to be used in physics-based
modeling.
Spall initiation model
(6)
where:
(7)
(8)
Spall progression model
(9)
where:
(10)
,
Table 2. Bearing fatigue life models
3.1.3. Li-ion Battery Dataset
Electric unmanned aerial vehicle (eUAV) li-ion batteries
were used in this prognostic approach (Saha and Goebel,
2007). The batteries were charged and discharged at
different ambient temperatures and different load currents.
There are 4 samples under the same operating conditions
First European Conference of the Prognostics and Health Management Society, 2012
151
European Conference of Prognostics and Health Management Society 2012
5
and in total 36 samples are provided. Battery capacity fade
is chosen as a failure indicator for these experiments. It was
assumed that 30% of battery capacity fade, for example a
reduction of 2000 to 1400 mAH was considered as failure.
Voltage, current and battery temperature measurements are
provided with the dataset as indirect health indicators.
Impedance and capacity measurements were given with the
dataset as damage criteria which are direct health indicators.
Only four set of batteries under the same operating and
environmental conditions are not enough to apply data-
driven prognostics in an effective way.
Typically battery capacity or end of life (EOL) modeling is
done for physics-based prognostics purposes. A remaining
battery capacity model can be found in the literature (Rong
and Pedram, 2006). All parameters, other than constant
coefficients which are determined from experimental testing
by curve fitting, are available to be employed in their model.
This dataset was therefore found to be eligible for physics-
based modeling.
3.1.4. Turbofan Engine Degradation Simulation Dataset
This dataset contains 4 sets of data each of which is a
combination of 2 failure modes and 2 operating conditions.
Each set has at least 200 engine degradation simulations
carried out using C-MAPSS which are divided into training
and test subsets (Saxena and Goebel, 2008). Twenty one
different sensor measurements as well as RUL values for
test subsets are given (Saxena et al., 2008). However, health
indicators were not provided with the dataset.
Degradation in the HPC and Fan of the turbofan engine is
simulated and dataset consists of multiple multivariate time
series data. The simulations employ several operating
conditions. The model that the dataset owners applied is
exponential degradation shown in Eq. 11 where ( ) is initial
degradation, ( ) is a scaling factor, ( ) time varying
exponent, and ( ) is upper wear threshold. The model is a
generalized equation of common damage propagation
models (e.g. Arrhenius, Coffin-Manson, and Eyring
models).
The dataset is eligible for data-driven approach since
sufficient data and RUL values are available with dataset.
Either statistical or machine learning data-driven models can
be employed to predict the RUL of turbofan engines. On the
other hand, it is not appropriate for physics based modeling
since the health index parameters are not given and no
physics-based model found for whole engine system
degradation.
3.1.5. IGBT Accelerated Aging Dataset
The dataset involves thermal overstress aging experiments
of Insulated Gate Bipolar Transistors (IGBTs). IGBTs are
power semiconductor devices used in switching applications
such as traction motor control, and switched-mode power
supplies (SMPS). Five IGBTs were aged with a squared
signal at gate and one was aged with DC waveforms
(Celaya et al., 2009). The experiments were stopped after
thermal runaway or latch-up failures were detected.
Collector current, gate voltage, collector-emitter voltage,
and package temperature measurements are given as indirect
health indicators.
There are five run-to-failure samples under the same
conditions. The dataset owners also declared that they
experienced several problems with aging systems
(Sonnenfeld et al., 2008). Thus, it is difficult to claim that
the dataset could be employed for data-driven prognostics
effectively.
The Coffin-Manson model (Eq. 12) is used as a physics-
based model for thermal cycling applications (Cui, 2005). It
is a function of temperature parameters and Arrhenius term
( ). Arrhenius term is evaluated when the maximum
temperature ( ) is reached in each cycle. Temperature
parameters to be used in the model are given with the
dataset. The dataset was therefore found to be eligible for
employing a physics-based approach.
Coffin-Manson Model
(12)
(13)
Table 3. Physics-based models for temperature cycling
3.2. Virkler Fatigue Crack Growth Dataset
Structural health monitoring (SHM) is the process of
implementing damage identification for typically civil,
aerospace or mechanical engineering infrastructure (Farrar
and Worden, 2007). In the SHM field, fatigue cracks are
defined as one of the primary structural damage
mechanisms caused by cyclic loadings. Cracks at the
structure surface grow gradually. Once a crack has reached
the critical length (determined by standards), the structure
will suddenly fracture and it may cause the system to fail
catastrophically. Therefore prediction of fatigue life or
fatigue crack growth in structures is necessary.
First European Conference of the Prognostics and Health Management Society, 2012
152
European Conference of Prognostics and Health Management Society 2012
6
The Virkler fatigue crack growth dataset (Virkler et al.,
1979) contains 68 run-to-failure specimens. Specimens used
for experiments are center cracked sheets of 2024-T3
aluminum. Each specimen had a notch of 9mm initial crack
length and experiments were stopped once the crack lengths
reached about 50 mm. The crack length information is
provided as a direct health indicator of the specimens and is
given in the dataset. Each specimen has 164 crack length
observation points as shown in Figure 4. However indirect
sensory measurements such as vibration, acoustic emission
etc. is not provided.
Figure 4. Crack length propagation samples under the same
loading conditions
The Virkler dataset is eligible for data-driven and physics
based prognostics, since there are sufficient run-to-failure
samples and crack growth equations compared to prognostic
dataset sample sizes mentioned in section 2. Sixty eight
samples are sufficient to develop data-driven methods.
Crack growth formulation as shown in Eqs. 14 and 15 can
easily be used in physics based prognostics (Paris and
Erdogan, 1963: Cross et al., 2006). The Paris & Erdogan
crack growth rate ( ) formula consists of the material
specific constants ( ) and the range of intensity
factor ( ) where ( ) is range of cyclic stress amplitude,
( ) geometric constant, and ( ) is crack length.
Challenge and requirement analysis of six different dataset
has been performed both considering data-driven and
physics-based modeling demands. As a result, it’s found to
be 4 out of 6 datasets can be modeled employing a physics-
based approach easily while only two of them are applicable
for a data-driven prognostics approach.
A summary table of all datasets is shown in Table 4.
Compared to other datasets, the Virkler dataset was found to
be the most applicable considering the requirements of both
data-driven and physics-based approaches.
Dataset Data-Driven
Modeling
Physics-based
Modeling
Milling Dataset Hard Applicable
Bearing Dataset Hard Hard
Battery Dataset Hard Applicable
Engine Dataset Applicable Hard
IGBT Dataset Hard Applicable
Virkler Dataset Applicable Applicable
Table 4. Prognostic approach applicability table
4. CONCLUSION
Physics-based and data-driven models are two major
prognostic approaches which have been employed in several
case studies found in the literature. This paper attempts to
conduct requirement analysis for prognostic methods and
reports the challenges of applying the two major approaches
into different datasets. In general, physics-based models
require the presence of a mathematical representation of the
physics of failure degradation and the parameters used in
degradation modeling. Data-driven models require
statistically sufficient run-to-failure samples. Several
datasets were examined both considering physics-based and
data-driven approaches and eligibility of datasets are
summarized. The Virkler dataset was found to be the most
suitable with the data-driven and physics-based models. The
Virkler dataset has therefore been selected to be used in a
hybrid prognostic approach in the future.
ACKNOWLEDGEMENT
This research was supported by the IVHM Centre, Cranfield
University, UK and its industrial partners.
REFERENCES
Agogino, A., & Goebel, K. (2007). “Mill Data Set”, BEST
lab, UC Berkeley. NASA Ames
PrognosticsDataRepository,[http://ti.arc.nasa.gov/projec
t/prognostic-data-repository], NASA Ames, Moffett
Field, CA.
Baruah, P., & Chinnam, R. B. (2005). HMMs for
diagnostics and prognostics in machining processes.
International Journal of Production Research, 43(6),
1275-1293. doi:10.1080/00207540412331327727
Camci, F., & Chinnam, R. (2010). Health-state estimation
and prognostics in machining processes. IEEE
First European Conference of the Prognostics and Health Management Society, 2012
153
European Conference of Prognostics and Health Management Society 2012
7
Transactions on Automation Science and Engineering,
7(3), 581-597. Retrieved from
http://dx.doi.org/10.1109/TASE.2009.2038170
Cross, R. J., Makeev, A., & Armanios, E. (2006). A
Comparison of Predictions From Probabilistic Crack
Growth Models Inferred From Virkler’s Data. Journal
of the ASTM International, 3(10), 1-11. Retrieved
February 21, 2012, from
http://soliton.ae.gatech.edu/people/andrew.makeev/AST
M
Cui, H. H. (2005). Accelerated Temperature Cycle Test and
Coffin-Manson Model for Electronic Packaging.
Proceedings of the Annual Reliability and
Maintainability Symposium, 556-560.
Dempsey, P. J., Kreider, G., & Fichter, T. (2006).
Investigation of tapered roller bearing damage detection
using oil debris analysis. Aerospace Conference, 2006
IEEE (p. 11–pp).doi: 10.1109/AERO.2006.1656082
Diamanti, K., & Soutis, C. (2010). Structural health
monitoring techniques for aircraft composite structures.
Progress in Aerospace Sciences, 46(8), 342-352.
Elsevier. doi:10.1016/j.paerosci.2010.05.001
Eker O. F., (2011). Development of a New State Based
Prognostics Method. MSc Thesis. Graduate Institute of
Sciences and Engineering, Fatih University, Istanbul,
Turkey.
Eker, O. F., Camci, F., Guclu, A., Yilboga, H., Sevkli, M.,
& Baskan, S. (2011). A Simple State-Based Prognostic
Model for Railway Turnout Systems. Industrial
Electronics, IEEE Transactions on, 58(5), 1718–1726.
F. Camci, K. Medjaher, N. Zerhounib & P. Nectoux,
Feature Evaluation for Effective Bearing Prognostics,
Quality Reliability Engineering International, DOI:
10.1002/qre.1396
Farrar, C. R., & Worden, K. (2007). An introduction to
structural health monitoring. Philosophical
transactions. Series A, Mathematical, physical, and
engineering sciences, 365(1851), 303-15.
doi:10.1098/rsta.2006.1928 Gebraeel, N., Elwany, A., & Jing, P. (2009). Residual Life
Predictions in the Absence of Prior Degradation
Knowledge. IEEE Transactions On Reliability, 58(1),
106-117. doi:10.1109/TR.2008.2011659
Gebraeel, N. Z., Lawley, M. a., Li, R., & Ryan, J. K. (2005).
Residual-life distributions from component degradation
signals: A Bayesian approach. IIE Transactions, 37(6),
543-557. doi:10.1080/07408170590929018
Hai Qiu, Jay Lee, Jing Lin. (2006). Wavelet Filter-based
Weak Signature Detection Method and its Application
on Roller Bearing Prognostics. Journal of Sound and
Vibration, 289, 1066-1090.
doi:10.1016/j.jsv.2005.03.007
Heng, a, Zhang, S., Tan, A., & Mathew, J. (2009). Rotating
machinery prognostics: State of the art, challenges and
opportunities. Mechanical Systems and Signal
Processing, 23(3), 724-739.
Huang, R., Xi, L., Li, X., Richard Liu, C., Qiu, H., & Lee, J.
(2007). Residual life predictions for ball bearings based
on self-organizing map and back propagation neural
network methods. Mechanical Systems and Signal
Processing, 21(1), 193-207.
doi:10.1016/j.ymssp.2005.11.008
J. Celaya, Phil Wysocki, & K. Goebel (2009), "IGBT
accelerated aging data set", NASA Ames Prognostics
Data Repository, [/tech/dash/pcoe/prognostic-data-
repository/], NASA Ames, Moffett Field, CA.
J. Lee, H. Qiu, G. Yu, J. Lin, & Rexnord Technical Services
(2007).'Bearing Data Set', IMS, University of
Cincinnati. NASA Ames Prognostics Data Repository,
[http://ti.arc.nasa.gov/project/prognostic-data-
repository], NASA Ames, Moffett Field, CA
Andrew K.S. Jardine, Daming, L., & Dragan, B. (2006).
Review: A review on machinery diagnostics and
prognostics implementing condition-based
maintenance. Mechanical Systems and Signal
Processing, 201483-1510.
doi:10.1016/j.ymssp.2005.09.012
Jennions I. K. (2011). Integrated vehicle health
management: Perspectives on an emerging field.
Pennsylvania, USA: SAE International.
Kotzalas, M., & Harris, T. (n.d). Fatigue failure progression
in ball bearings. Journal of Tribology-Transactions of
The Asme, 123(2), 238-242.
Namburu, M., Pattipati, K., Kawamoto, M., & Chigusa, S.
(2003). Model-based prognostic techniques
[maintenance applications]. Proceedings
AUTOTESTCON 2003. IEEE Systems Readiness
Technology Conference., 330-340.
NASA Ames Prognostics data repository, retrieved Oct.
2011, from:
http://ti.arc.nasa.gov/tech/dash/pcoe/prognostic-data-
repository/
Ocak, H., Loparo, K., & Discenzo, F. (2007). Online
tracking of bearing wear using wavelet packet
decomposition and probabilistic modeling: A method
for bearing prognostics. Journal of Sound and
Vibration, 302(4-5), 951-961.
doi:10.1016/j.jsv.2007.01.001
Orsagh, R. F., Sheldon, J. & Klenke, C. J. (2003),
"Prognostics/diagnostics for gas turbine engine
bearings", ASME Turbo Expo, Vol. 1, 16-19 June 2003,
Atlanta, GA, pp. 159.
Papazian, J. M., Anagnostou, E. L., Engel, S. J., Hoitsma,
D., Madsen, J., Silberstein, R. P., Welsh, G. (2009). A
structural integrity prognosis system. Engineering
Fracture Mechanics, 76(5), 620-632.
doi:10.1016/j.engfracmech.2008.09.007
Paris, P.C. & Erdogan, F. (1963). A critical analysis of
crack propagation laws. Journal of Basic Engineering,
Trans. ASME, Ser. D, 85, 528–534.
Peng, Y., Dong, M., & Zuo, M. J. (2010). Current status of
machine prognostics in condition-based maintenance: a
First European Conference of the Prognostics and Health Management Society, 2012
154
European Conference of Prognostics and Health Management Society 2012
8
review. The International Journal of Advanced
Manufacturing Technology, 50(1-4), 297-313.
Qiu, H., Lee, J., Lin, J., & Yu, G. (2003). Robust
performance degradation assessment methods for
enhanced rolling element bearing prognostics.
Advanced Engineering Informatics, 17(3-4), 127-140.
doi:10.1016/j.aei.2004.08.001
Rong, P., & Pedram, M. (2006). An Analytical Model for
Predicting the Remaining Battery Capacity of Lithium-
Ion Batteries. Integration The Vlsi Journal, 14(5), 441-
451.
Saha, B., & Goebel K. (2007). “Battery Data Set”, NASA
Ames Prognostics Data Repository,
[http://ti.arc.nasa.gov/project/prognostic-data-
repository], NASA Ames, Moffett Field, CA
Saxena, A., & Goebel K. (2008). “C-MAPSS Data Set”,
NASA Ames Prognostics Data Repository,
[http://ti.arc.nasa.gov/project/prognostic-data-
repository], NASA Ames, Moffett Field, CA
Saxena, A., Goebel, K., Simon, D., & Eklund, N. (2008).
Damage propagation modeling for aircraft engine run-
to-failure simulation. Prognostics and Health
Management, 2008. PHM 2008. International
Conference on (pp. 1–9). IEEE.
Sonnenfeld, G., Goebel, K., Cleaya, J., (2008). An agile
accelerated aging, characterization and scenario
simulation system for gate controlled power transistors.
AUTOTESTCON, 2008.
Virkler, D. A., Hillberry, B. M., & Goel, P. K. (1979). The
Statistical Nature of Fatigue Crack Propagation.
Journal of Engineering Materials and Technology,
101(2), pp. 148–153.
Yen, Y., Sohner, J., Lilly, B., Altan, T., (2004). Estimation
of tool wear in orthogonal cutting using the finite
element analysis. Journal of Materials Processing
Technology, 146(1), 82-91.
Yu, K. W., Harris, T. A. New Stress-Based Fatigue Life
Model for Ball Bearings. (2001) Tribology
Transactions, Vol. 44(1), pp. 11-18.
Zhang, H., Kang, R., & Pecht, M. (2009). A hybrid
prognostics and health management approach for
condition-based maintenance. 2009 IEEE International
Conference on Industrial Engineering and Engineering
Management, Dec 8-11, Hong Kong, pp. 1165-1169.
doi:10.1109/IEEM.2009.5372976
Zhang, L. L., Li, X. X., & Yu, J. J. (2006). A review of fault
prognostics in condition based maintenance [6357-182].
Proceedings- Spie The International Society For
Optical Engineering, 6357635752.
BIOGRAPHIES
Omer Faruk Eker is a PhD student in
School of Applied Sciences and works
as researcher at IVHM Centre, Cranfield
University, UK. He received his B.Sc.
degree in Mathematics from Marmara University and M.Sc.
in Computer Engineering from Fatih University, Istanbul,
Turkey. He got involved in a project funded by TUBITAK,
and Turkish State Railways. His research interests include
failure diagnostics and prognostics, condition based
maintenance, pattern recognition and data mining.
Dr. Fatih Camci works as a faculty at
IVHM Centre at Cranfield University
since 2010. He has worked on many
research projects related to Prognostics
Health Management (PHM) in USA,
Turkey, and UK. His PhD work was
supported by National Science Foundation
in USA and Ford Motor Company on development of
novelty detection, diagnostics, and prognostics methods. He
worked as senior researcher at Impact Technologies, world-
leading SME on PHM, for two years. He has involved in
many projects funded by US Navy and US Air Force
Research Lab. These projects involve development of
maintenance planning and logistics with PHM. He then
worked as Asst. Prof. in Turkey. He has led a research
project, funded by TUBITAK (The Scientific and
Technological Research Council of Turkey) and Turkish
State Railways, on development of prognostics and
maintenance planning systems on railway switches. In
addition to PHM, his research interest involves decision
support systems and energy.
Ian Jennions Ian’s career spans over 30
years, working mostly for a variety of gas
turbine companies. He has a Mechanical
Engineering degree and a PhD in CFD both
from Imperial College, London. He has
worked for Rolls-Royce (twice), General
Electric and Alstom in a number of
technical roles, gaining experience in aerodynamics, heat
transfer, fluid systems, mechanical design, combustion,
services and IVHM. He moved to Cranfield in July 2008 as
Professor and Director of the newly formed IVHM Centre.
The Centre is funded by a number of industrial companies,
including Boeing, BAe Systems, Rolls-Royce, Thales,
Meggitt, MOD and Alstom Transport. He has led the
development and growth of the Centre, in research and
education, over the last three years. The Centre offers a
short course in IVHM and the world’s first IVHM MSc,
begun in 2011.
Ian is on the editorial Board for the International Journal of
Condition Monitoring, a Director of the PHM Society,
contributing member of the SAE IVHM Steering Group and
HM-1 IVHM committee, a Fellow of IMechE, RAeS and
ASME. He is the editor of the recent SAE book: IVHM –
Perspectives on an Emerging Field.
First European Conference of the Prognostics and Health Management Society, 2012
155
Physics Based Electrolytic Capacitor Degradation Models forPrognostic Studies under Thermal Overstress
Chetan S. Kulkarni1, Jose R. Celaya2, Kai Goebel3, and Gautam Biswas4
1,4 Vanderbilt University, Nashville, TN, 37235, [email protected]@vanderbilt.edu
2 SGT Inc. NASA Ames Research Center, Moffett Field, CA, 94035, [email protected]
3 NASA Ames Research Center, Moffett Field, CA, 94035, [email protected]
ABSTRACT
Electrolytic capacitors are used in several applications rang-ing from power supplies on safety critical avionics equipmentto power drivers for electro-mechanical actuators. This makesthem good candidates for prognostics and health managementresearch. Prognostics provides a way to assess remaining use-ful life of components or systems based on their current stateof health and their anticipated future use and operational con-ditions. Past experiences show that capacitors tend to degradeand fail faster under high electrical and thermal stress condi-tions that they are often subjected to during operations. Inthis work, we study the effects of accelerated aging due tothermal stress on different sets of capacitors under differentconditions. Our focus is on deriving first principles degra-dation models for thermal stress conditions. Data collectedfrom simultaneous experiments are used to validate the de-sired models. Our overall goal is to derive accurate models ofcapacitor degradation, and use them to predict performancechanges in DC-DC converters.
1. INTRODUCTION
Most devices and systems today contain embedded electronicmodules for monitoring, control and enhanced functionality.In spite of the electronic modules being used to enhance sys-tem performance and capabilities, these modules are often thefirst elements in the system to fail (Saha et al., 2009; Goebelet al., 2008; Saxena et al., 2008). These failures can be at-tributed to adverse operating conditions, such as high tem-peratures, voltage surges and current spikes. Studying and
Chetan S. Kulkarni et.al. This is an open-access article distributed under theterms of the Creative Commons Attribution 3.0 United States License, whichpermits unrestricted use, distribution, and reproduction in any medium, pro-vided the original author and source are credited.
analyzing the degradation of these systems (i.e.,degradationin performance) provides data that can be used to meet criti-cal goals like advance failure warnings; (Goebel et al., 2008;Saxena et al., 2008), unscheduled maintenance;(Saha et al.,2009) which play an important role in aviation safety.
The term “diagnostics” relates to the ability to detect andisolate faults or failures in a system. “Prognostics” on theother hand is the process of predicting health condition andremaining useful life based on current state and previousconditions. Prognostics and health management (PHM) isa method that permits the assessment of the reliability of asystem under its actual application conditions. PHM meth-ods combine sensing, data collection, interpretation of envi-ronmental, operational, and performance related parametersto indicate systems health. PHM methodologies can be im-plemented through the use of various techniques that studyparameter variations, which indicate changes in parameterdegradation and operation performace based on variations ina life-cycle profile.
Prognostics and Health Management (PHM) methodologieshave emerged as one of the key enablers for achieving ef-ficient system level maintenance and safety in military sys-tems (Saha et al., 2009). Prognostics and health managementfor electronic systems aims to detect, isolate, and predict theonset and source of system degradation as well as the timeto system failure. The goal is to make intelligent decisionsabout the system health and to arrive at strategic and businesscase decisions. As electronics become increasingly complex,performing PHM efficiently and cost-effectively is becomingmore demanding (Saha et al., 2009; J. R. Celaya et al., 2010).
In the aerospace domain, flight and ground staff need to ac-quire information regarding the current health state for all
1
First European Conference of the Prognostics and Health Management Society, 2012
156
European Conference of the Prognostics and Health Management Society, 2012
subsystems of the aircraft, such as structures, propulsion,control, guidance and navigation systems on a regular basisto maintain safe operation. This has given rise to researchprojects that focus on accurate diagnosis of faults, developingprecursors to failure, and predicting remaining componentlife (Balaban et al., 2010; J. R. Celaya et al., 2010). Most theavionics systems and subsystems in todays modern aircraftscontain significant electronics components which perform acritical role in on-board, autonomous functions for vehiclecontrols, communications, navigation and radar systems. Fu-ture aircraft systems will rely on more electric and electroniccomponents. Therefore, this may also increase the rate ofelectronics related faults that occur in these systems with per-haps unanticipated fault modes that will be hard to detect andisolate. It is very important to provide system health aware-ness for digital electronics systems on-board, to improve air-craft reliability, assure in-flight performance, and reducingmaintenance cost. An understanding of how components de-grade is needed as well as the capability to anticipate fail-ures and predict the remaining useful life of electronic com-ponents (Balaban et al., 2010; Saha et al., 2009).
1.1. Related Work
The output filter capacitor has been identified as one of theelements of a switched mode power supply that fails morefrequently, and therefore, has a critical impact on perfor-mance (Vohnout et al., 2008; Goodman et al., 2007; Orsagh& et’al, 2006). A prognostics and health management (PHM)approach for power supplies of avionics systems is presentedin (Orsagh & et’al, 2006).
A health management approach for multilayer ceramic ca-pacitors is presented in the work by (Nie et al., 2007). Thisapproach focuses on the temperature-humidity bias acceler-ated test to replicate failures. This approach to fault detectionuses data trending algorithms in conjunction with multivariatedecision-making. The Mahalanobis distance (MD) is used todetect abnormalities within the data and classify the data into“normal” and “abnormal” groups. The abnormal data are thenfurther classified into severity levels of abnormality based onwhich predictions of RUL are made.
In the study done by (Gu et al., 2008), 96 multi-layer ceramiccapacitors (MLCC) were selected for in-situ monitoring andlife testing under elevated temperature (85C) and humidity(85% RH) conditions with one of 3 DC voltage bias levels:rated voltage (50 V), low voltage (1.5 V), and no voltage (0V). This method uses data from accelerated aging tests to de-tect potential failures and to make an estimation of time offailure. A data driven fault detection algorithm for multilayerceramic capacitor failures is presented in (Gu & Pecht, 2008).The approach used in this study combines regression anal-ysis, residual, detection and prediction analysis (RRDP). Amethod based on Mahalanobis distance is used to detect ab-
normalities in the test data; there is no prediction of RUL.
In the work done by (Wereszczak et al., 1998) the failureprobability of the Barium Titanate used for the manufactur-ing of MLCC’s was studied. Dielectric ceramics in multi-layer capacitors are subjected to thermo-mechanical stresses,which, may cause mechanical failure and lead to loss of elec-trical function. Probabilistic life design or failure probabilityanalysis of a ceramic component combines the strength dis-tribution of the monolithic ceramic material comprising thecomponent, finite element analysis of the component underthe mechanical loading conditions of interest, and a multiax-ial fracture criterion.
The work by (Buiatti et al., 2010) looked at the degradationin metalized polypropylene film (MPPF) capacitors, where anoninvasive technique for capacitor diagnostics in Boost con-verters is studied. This technique is based on the double es-timations of the ESR and the capacitance, improving the di-agnostic reliability and allowing for predictive maintenanceusing a low-cost digital signal processor (DSP).
We adopt a physics based modeling (PBM) approach to pre-dict the dynamic behavior of the system under nominal anddegraded conditions. Faults and degradations appear as pa-rameter value changes in the model, and this provides themechanisms for tracking system behavior under degradedconditions (Kulkarni et al., 2009).
In DC-DC power supplies used as subsytem in avionics sys-tems (Bharadwaj et al., 2010; Kulkarni et al., 2009), elec-trolytic capacitors and MOSFET switches are known to havethe highest degradation and failure rates among all of thecomponents (Goodman et al., 2007; Kulkarni et al., 2009).Degraded electrolytic capacitors affect the performance andefficiency of the DC-DC converters in a significant way. Weimplement the PHM methodology to predict degradation inelectrolytic capacitors combining the physics of failure mod-els with data collected from experiments on the capacitorsunder different simulated operating conditions. In (Kulkarni,Biswas, et al., 2011b; Kulkarni, Celaya, et al., 2011) we dis-cuss about degradation related to thermal overtress conditionsand qualitative degradation mechanims. In this paper we dis-cuss the derived physics based modeling and degradation re-lated to thermal overstress condition (TOS) along with theexperiments conducted.
2. ELECTROLYTIC CAPACITORS
Electrolytic capacitor performance is strongly affected by itsoperating conditions, such as voltage, current, frequency, andambient temperatures. When capacitors are used in powersupplies and signal filters, degradation in the capacitors in-creases the impedance path for the AC current and decreasein capacitance introduces ripple voltage on top of the desiredDC voltage. Continued degradation of the capacitor leads the
2
First European Conference of the Prognostics and Health Management Society, 2012
157
European Conference of the Prognostics and Health Management Society, 2012
converter output voltage to drop below specifications affect-ing downstream components. In some cases, the combinedeffects of the voltage drop and the ripples may damage theconverter and downstream components leading to cascadingfailures in systems and subsystems.
A primary reason for wear out in aluminum electrolytic ca-pacitors is due to vaporization of electrolyte (Goodman etal., 2007) and degradation of electrolyte due to ion exchangeduring charging/discharging (Gomez-Aleixandre et al., 1986;Ikonopisov, 1977) , which, in turn leads to a drift in the twomain electrical parameters of the capacitor: (1) the equivalentseries resistance (ESR), and (2) the capacitance (C). TheESR of a capacitor is the sum of the resistance due to alu-minum oxide, electrolyte, spacer, and electrodes (foil, tab-bing, leads, and ohmic contacts) (Hayatee, 1975; Gasperi,1996). The health of a capacitor is often indicated by thevalues of these two parameters. There are certain industrystandard thresholds for these parameter values, upon crossingthese threshold barrier the component is considered unhealthyto be used in a system, i.e., the component has reached its endof life, and should be immediately replaced before further op-erations (Lahyani et al., 1998; Eliasson, 2007; Imam et al.,2005).
As illustrated in Fig. 1 an aluminum electrolytic capacitor,consists of a cathode aluminum foil, electrolytic paper, elec-trolyte, and an aluminum oxide layer on the anode foil sur-face, which acts as the dielectric. When in contact with theelectrolyte, the oxide layer possesses an excellent forward di-rection insulation property (Gasperi, 1996). Together withmagnified effective surface area attained by etching the foil,a high capacitance value is obtained in a small volume (Fife,2006). Since the oxide layer has rectifying properties, a ca-
Anode Foil
Cathode Foil
Connecting Lead
Aluminum Tab
Separator Paper
Figure 1. Physical Model of Electrolytic Capacitor
pacitor has polarity. If both the anode and cathode foils havean oxide layer, the capacitors would be bipolar. In this work,we analyze “non-solid” aluminum electrolytic capacitors inwhich the electrolytic paper is impregnated with liquid elec-trolyte. The another type of aluminum electrolytic capacitor,that uses solid electrolyte (Bengt, 1995) is not discussed in
this work.
2.1. Overview of Degradation Mechanisms
The flow of current during the charge/ discharge cycle of thecapacitor causes the internal temperature to rise. The heatgenerated is transmitted from the core to the surface of thecapacitor body, but not all the heat generated can escape. Theexcess heat results in a rise in the internal temperature ofthe capacitors which causes the electrolyte to evaporate, andgradually deplete (Kulkarni, Biswas, et al., 2011b; Kulkarni,Celaya, et al., 2011). Similarly in situations where the capac-itor is operating under high temperature conditions, the ca-pacitor body is at a higher temperature than its core, the heattravels in the opposite directions from the body surface to thecore of the capacitor again increasing the internal temperaturecausing the electrolyte to evaporate. This is explained usinga first principles thermal model of heat conduction (Kulkarni,Biswas, et al., 2011b; Kulkarni, Celaya, et al., 2011).
Degradation in the oxide layer can be attributed to crystal de-fects that occur because of the periodic heating and coolingduring the capacitor’s duty cycle, as well as stress, cracks, andinstallation-related damage. High electrical stress is known toaccentuate the degradation of the oxide layer due to localizeddielectric breakdowns on the oxide layer (Ikonopisov, 1977;Wit & Crevecoeur, 1974). These breakdowns, which accel-erate the degradation, have been attributed to the duty cycle,i.e., the charge/discharge cycle during operation (Ikonopisov,1977). Further another simultaneous phenomenon is theincrease in the internal pressure (Gomez-Aleixandre et al.,1986) due to an increased rate of chemical reactions, whichcan again be attributed to the internal temperature increase inthe capacitor. This pressure increase can ultimately lead tothe capacitor popping.
All the failure/degradation phenomenon mentioned may actsimultaneously based on the operating conditions of the ca-pacitors. We first study the phenomenon qualitatively, andthen discuss the steps to derive the first principles analyticdegradation models for the different thermal stress condition.Electrolyte evaporations is caused either due to increase in in-ternal core temperature or external surrounding temperature.Both phenomenon lead to the same degradation mode,causedeither by the high electrical stress or thermal stress, respec-tively.
3. THERMAL OVERSTRESS EXPERIMENT
In this setup we emulated conditions similar to high tem-perature storage conditions (Kulkarni, Biswas, et al., 2011b;Kulkarni, Celaya, et al., 2011), where capacitors are placedin a controlled chamber and the temperature is raised abovetheir rated specification (60068-1, 1988). Pristine capacitorswere taken from the same lot rated for 10V and maximumstorage temperature of 85C.
3
First European Conference of the Prognostics and Health Management Society, 2012
158
European Conference of the Prognostics and Health Management Society, 2012
The chamber temperature was gradually increased in steps of25C till the pre-determined temperature limit was reached.The capacitors were allowed to settle at a set temperature for15 min and then the next step increase was applied. Thisprocess was continued till the required temperature limit wasattained. To decrease possibility of shocks due to sudden de-crease in the temperature the above procedure was followed.
Experiments done with 2200 µF capacitors with TOS tem-perature at 105C and humidity factor at 3.4%. At the end ofspecific time interval the temperature was lowered in steps of25C till the required room temperature was reached. Beforebeing characterized the capacitors were kept at room temper-ature for 15 min. The ESR value is the real impedance mea-sured through the terminal software of the instrument. Simi-larly the capacitance value is computed from the imaginaryimpedance using Electrochemical Impedance Spectroscopy(EIS). Characterization of all the capacitors was done formeasuring the impedance values using an SP-150 Biologicimpedance measurement instrument (Biologic, 2010).
0 1000 2000 30001600
1650
1700
1750
1800
1850
1900
1950
Time (hrs)
Cap
acit
ance
(u
F)
Capacitance vs. Time for 2200uF @ 105C
Cap1Cap2Cap3Cap4Cap5Cap6Cap7Cap8Cap9Cap10Cap11Cap12Cap13Cap14Cap15
Figure 2. Capacitance Plot for all the devices under TOS
4. PHYSICS BASED MODELING OF CAPACITOR DEGRA-DATION
Based on the above discussions on degradation and experi-ments conducted, in this section we discuss about derivingthe first principles models for thermal overstress conditions.Under thermal overstress conditions since the device was sub-jected to only high temperature with no charge applied weobserve degradation only due to electrolyte evaporation. Themodels are derived based on this observations and measure-ments see during from the experimental data.
For deriving the physics based models it is also very muchnecessary to know about the structural details of the compo-nent under study. The models defined use this information
for making effective degradation/failure predictions. A de-tail structural study of the electrolytic capacitor under test isdiscussed in this section.
During modeling it is not possible to know the exact amountof electrolyte present in a capacitor. But using informationfrom the structure details we can approximately calculate theamount of electrolyte present. Based on the type and configu-ration, the electrolyte volume will vary which can be updatedin the model parameters. The equation for calculating the ap-proximate electrolyte volume is derived from calculating thevolume of the total capacitor capsule, is given by :
Vc = πr2chc (1)
The amount of electrolyte present depends on the type of pa-per used as a separator between the anode and cathode foils.A highly porous paper type is used in the construction of thecapacitor such that maximum amount of electrolyte can besoaked in the paper. The electrolyte is completely soaked inthe paper spacer. Hence the electrolyte volume can be ap-proximated as :
Ve ≈ Vpaper (2)
The approximate volume of electrolyte, Ve based on ge-ometry of the capacitor is expressed in terms of followingequation:
Ve = πr2chc −Asurface(dA + dC) (3)
A simplified electrical lumped parameter model ofimpedance, M1 defined for a electrolytic capacitor isas shown in Fig.3. The ESR dissipates some of the storedenergy in the capacitor. In spite of the dielectric insulationlayer between a capacitor’s plates, a small amount of ‘leak-age’ current flows between the plates. For a good capacitoroperating nominally this current is not significant, but itbecomes larger as the oxide layer degrades during operation.An ideal capacitor would offer no resistance to the flow ofcurrent at its leads. However, the electrolyte , aluminumoxide , space between the plates and the electrodes combinedproduces a small equivalent internal series resistance.
C1R1
ESR
C
1mΩ
R3 ≥ 10K R4 ≥ 10K
R2RE C2Anode foil
electrode
resistance
Cathode foil
electrode
resistance
Electrolyte
resistance
R1
2 mΩ
R2
RE
1mΩ
C1R1
2 mΩ
RE
Coxide_layer
C1
Figure 3. Lumped Parameter Model (M1 )
From the literature (Rusdi et al., 2005; Bengt, 1995; Roeder-stein, 2007) and experiments conducted under thermal over-stress, it has been observed that the capacitance and ESRvalue depends of the electrolyte resistance RE . A more de-tailed lumped parameter model derived for an electrolytic ca-pacitor under thermal overstress condition,M2 can be modi-fied fromM1, as shown in Fig. 4. R1 is the combined series
4
First European Conference of the Prognostics and Health Management Society, 2012
159
European Conference of the Prognostics and Health Management Society, 2012
and parallel resistances in the model. RE is the electrolyteresistance. The combined resistance of R1 and RE is theequivalent series resistance of the capacitor. C is the totalcapacitance of the capacitor as discussed earlier.
R1 RE C
ESR
Figure 4. Updated Lumped Parameter Model (M2 )
4.1. First Principle Models
The input impedance of the capacitor network is defined interms of the total lumped series and parallel impedance ofthe simplified network. The total lumped capacitance of thestructure is given by
C = (2εRε0Asurface)/dC (4)
From the literature study (Rusdi et al., 2005; Bengt, 1995) formodeling ESR degradation it was observed that electrolyteresistance (RE) parameter, as discussed above forms a majorpart of combined ESR as shown in Fig. 3. Thus being adominant parameter any changes in RE lead to changes inthe ESR value. We studied the relationship between RE andAsurface, which gives us a degradation model for ESR. Theequation for RE is given by :
RE = ρE dC PE/(2 ∗ L ∗H) (5)
Since RE is a dominant parameter in ESR and any changesin RE affect ESR value, from Eq. (5) we express ESR interms of the oxide surface area, Asurface as:
ESR = ρE dC PE/(2 ∗Asurface) (6)
Exposure of the capacitors to temperatures Tapplied > Tratedresults in accelerated aging of the devices (Kulkarni, Celaya,et al., 2011; Kulkarni, Biswas, et al., 2011a; 60068-1, 1988).Higher ambient storage temperature accelerates the rate ofelectrolyte evaporation leading to degradation of the capaci-tance (Kulkarni, Celaya, et al., 2011; Bengt, 1995). The de-pletion in the volume and thus the effective surface area isgiven by Eq. (7).
V = Vo − (Asurface jeo we)× t (7)
Details of the derivation of this equation can be foundin (Kulkarni, Biswas, et al., 2011b; Rusdi et al., 2005). Evap-oration also leads to increase in the internal pressure of the ca-pacitor, which decreases electrolyte evaporation rate. Eq. (9)and Eq. (8) give us the decrease in the active surface area dueto evaporation of the electrolyte, which results in a decrease inC and an increase in ESR, respectively (Bengt, 1995; Roed-erstein, 2007).
4.1.1. Capacitance Degradation Model
Thus from Eq. (4) and (7) we can derive the first principlescapacitance degradation model, D1 which is given by :
D1 : C(t) =
[2εRε0dC
] [V0 − V (t)
jeo t we
](8)
The degradation in capacitance is directly proportional to thedamage parameter V . As discussed earlier, increase in thecore temperature evaporates the electrolyte thus decreasingthe electrolyte volume leading to degradation in capacitance.The resultant decrease in the capacitance can be computedusing Eq. (8).
4.1.2. ESR Degradation Model
From Eq. (6) and Eq. (7) the ESR degradation model, D2 isgiven as :
D2 : ESR(t) =
[ρE dC PE
2
] [jeo we t
V0 − V (t)
](9)
In this model there are two parameters which change withtime, rate of evaporation jeo and the correlation factor relatedto electrolyte spacer porosity and average liquid pathway, PE .As the electrolyte evaporates due to high temperature the cor-relation factor PE will increase as the average pathway of theliquid decreases. Electrolyte evaporation under thermal stressstorage condition results due to the increase in the high atmo-spheric temperature. Under this operating condition when thesurrounding temperature gets high, the temperature of the ca-pacitor capsule also increases. Heat travels from the surfaceof the body to the core of the temperature, this phenomenonis described through the thermal model (Kulkarni et al., 2009;Kulkarni, Celaya, et al., 2011).
The decrease in the capacitance parameter value is as a resultof the decrease in electrolyte volume due to evaporation underthermal overstress condition. This relationship between de-crease in capacitance with electrolyte volume is explained byEq. (8). Similarly increase inESR is given by the increase inthe electrolyte resistance (RE) as explained by Eq. (9). Withdecrease in the electrolyte due to thermal overstress, the aver-age liquid path length is reduced, which increases PE . Undernormal circumstances when the capacitors are stored at roomtemperature or below rated temperatures, no damage or de-crease in the life expectancy is observed. But in cases wherethe capacitors are stored under thermal stress conditions per-manent damage is observed.
In the thermal overstress experiments, the capacitors we char-acterized periodically and after 3400 hours of operation it wasobserved that the average capacitance (C) value decreased bymore than 8-9% while increase in the ESR value was ob-served around 20 - 22%. From literature (60068-1, 1988)
5
First European Conference of the Prognostics and Health Management Society, 2012
160
European Conference of the Prognostics and Health Management Society, 2012
under thermal overstress conditons higher capaitance degra-dation is observed and minor degradation in ESR which cor-related with the data collected. The failure thresholds understorage conditions for capacitance (C) is 10% while that forESR is around 280- 300% of the pristene condition values(60384-4-1, 2007; Kulkarni, Biswas, et al., 2011b). Based onthe degradation observed from the experiments capacitancedegradation was considered as a precursor to failure to esti-mate the current health condition of the device.
5. DEGRADATION MODELING
In our earlier work (J. Celaya et al., 2011b, 2011a; Kulkarniet al., 2012) an implementation of a model-based prognosticsalgorithm based on Kalman filter and a physics inspired em-pirical degradation model has been presented. The physicsinspired degradation model was derived based on the capaci-tance degradation data from electrical overstress experiments.This model relates aging time to the percentage loss in capac-itance and has the following form,
C(t) = eαt + β, (10)
where model constants α and β were estimated from the ex-perimental data. Here the exponential model was linked to thedegradation data and parameters were derived based on thisdata. The exponential empirical model derived in Eq. (10)was further updated and as discussed in section (4.1.1) wedeveloped a first principles based generalized model to be im-plemented for different capacitor types and operating condi-tions. In this work we looked into the degradation data underthermal overstress as discussed earlier and use Eq.( 8) to buildthe physics based model based on following equation:
Here in this section we will discuss the parameter estima-tion work done and study how well the developed degradationmodel, D1 behaves based on the estimated static parameters.As an initial step we implement a nonlinear least-squares re-gression algorithm to estimate the model parameters.
0 500 1000 1500 2000 2500 3000 3500470
480
490
500
510
520
530
Aging Time ( Hours )
Vol
ume
(mm
3 )
measured ( Cap. #1 −15)estimated (fit)
Figure 5. Estimation results for Volume decrease
Decrease in capacitance parameter is used as a precursor offailure. Based on the experiments, capacitance parameter val-ues are computed by characterizing the capacitors as shownin plots of Fig. 2. From the degradation model, D1 given acertain type of capacitor all the values in Eq. (8) can be com-puted except the dispersion volume V . Therefore dispersionvolume V is computed based on the available data and usedto build a physics based model, D1 of the degradation phe-nomenon. Initial electrolyte volume Vo at pristine conditionsis approximately computed from the physics and geometry ofthe capacitor. From the experimental data the estimated vol-ume computed decreases almost linearly through the initialphase of degradation. Hence in this work we propose a lin-ear dynamic model, which relates aging time to loss of elec-trolyte volume. The loss in electrolyte is linked to decreasein capacitance through Eq. (8) and has the following form,
Vk = θ1 + θ2 tk + θ3 tk2 (11)
where θ1, θ2 and θ3 are model constants for decrease in vol-ume V , which is estimated from the experimental data of ac-celerated thermal aging experiments. In order to estimate themodel parameters, 14 capacitors out of the 15 were used forthe experiment, (labeled capacitors #1 through #15), and theremaining capacitor is used to validate the model against ex-perimental data. A nonlinear least-squares regression algo-rithm is used to estimate the model parameters.
0 500 1000 1500 2000 2500 3000 3500−3
−2
−1
0
1
2
Time ( Hours )
Res
idu
als
Residuals
Figure 6. Residuals
The experimental data is presented together with results fromthe linear fit function of Eq. (11) and Eq. (8), as shown inFig. 5. It can be observed from the residuals of Fig. 6 that theestimation error increases with time. This is to be expectedsince the data takes a concave path after approximately 2500hours of operation, a dip is observed in the linear degrada-tion and hence we observe higher residuals values. This in-dicates that the phenomenon of volume decrease is not linearand we are working towards updating the model in Eq. (11).The updated model will include additional degradation phe-nomenons in addition to the current model explained which
6
First European Conference of the Prognostics and Health Management Society, 2012
161
European Conference of the Prognostics and Health Management Society, 2012
will take into consideration the dipping in the volume param-eter during the later stages of aging as observed.
0 500 1000 1500 2000 2500 3000 35001700
1750
1800
1850
Time (Hours)
Cap
acit
ance
(u
F)
0 500 1000 1500 2000 2500 3000 3500450
500
550
Time (Hours)
Vo
lum
e (m
m3 )
computed Volume # 4estimated Volume
measured Capacitance # 4estimated Capacitance
Figure 7. Volume and Capacitance Estimation (Cap # 4)
The updated degradation model is used as part to estimatethe capacitance based on the estimated decrease in volume.In Fig.7, based on the data from capacitors other than capaci-tor #4, volume parameters were estimated. This was validatedagainst the change in volume of capacitor #4, and the model,D1 was validated for decrease in capacitance.
Case θ1 θ2 θ3 MSE1 523.6123 -0.01613 3.7100*10−7 685.65932 523.6122 -0.01613 3.7099*10−7 685.08153 523.6159 -0.01614 3.9403*10−7 684.35794 523.6109 -0.01609 3.8072*10−7 687.37555 523.6128 -0.01614 3.8428*10−7 688.38246 523.6100 -0.01613 3.7867*10−7 690.61467 523.6081 -0.01614 3.7269*10−7 688.10038 523.6089 -0.01613 3.7988*10−7 691.71739 523.6111 -0.01616 3.7447*10−7 686.079910 523.6122 -0.01613 3.8470*10−7 687.892811 523.6076 -0.01611 3.7350*10−7 690.565012 523.6065 -0.01614 3.7313*10−7 683.069713 523.6147 -0.01609 3.8906*10−7 686.473914 523.6120 -0.01612 3.8276*10−7 689.631815 523.6113 -0.01616 3.8317*10−7 689.8948
X 523.6112 -0.0161 3.8077*10−7 687.6598X 523.6113 -0.0161 3.8072*10−7 687.8928S.D 0.0026 1.8748*10−5 6.9373*10−9 2.5339C.I 523.6098 -0.01614 0.3769*10−6 686.2565
523.6127 -0.01611 0.3846 *10−6 689.0630
Table 1. Parameter Estimation Results
Table 1 shows the estimated values of the parameters for eachcapacitor along with the mean square error observed for theestimated values.
6. CONCLUSION AND DISCUSSION
This paper presents a first principles based degradation elec-trolytic capacitor model and a parameter estimation algorithmto validate the derived model, based on the experimental data.The major contributions of the work presented in this paperare:
1. Identification of the lumped-parameter model, M1 andM2 (Fig. 3 and Fig. 4) based on the equivalent electricalcircuit of a real capacitor as a viable reduced-order modelfor prognostics-algorithm development;
2. Identification of capacitance (C) as a failure precursor inthe lumped parameter model,M1 as shown in Fig. 3;
3. Estimating the electrolyte volume from structural modelof the capacitor to be implemented in the first principlesdegradation model, D1;
4. Development of the first principles degradation modelbased on accelerated life test aging data which includesdecrease in capacitance as a function of time and evapo-ration rate linked to temperature conditions;
5. Implementation of parameter estimation algorithm tocross validate the derived first principles degradationmodel, D1.
The degradation model, D1 based on the first principles givesan indication of how a specific device degrades based on itsgeometrical structure, operating conditions, etc. The derivedmodel can be updated and developed at a more finer granularlevel to be implemented for detailed prognostic implemen-tation. The results presented here are based on acceleratedaging experimental data and on the accelerated life timescale.In our earlier work physics inspired degradation models basedon the observed data were discussed (J. Celaya et al., 2011b,2011a). The work discussed in this paper is a next step to gen-eralize the degradation model and has been tested for the cur-rent data of capacitors under constant temperature of 105C.As discussed in section (5), as a first step an linear model hasbeen implemented for decrease in volume, V and needs to beupdated to include the operating condition variables.
The performance of the proposed first principles degradationmodel, D1 is satisfactory for this study based on the qualityof the model fit to the experimental data and cross validationperformance based on the parameter estimations done. In thiswork, our major emphasis was on deriving the first principlesmodel for degradation and validating the model with a ba-sic non-linear regression model. Our future work will focuson exploring a detailed implementation of the physics basedmodel to Bayesian approach which can then be used for mak-ing more accurate degradation and failure predictions. Theother focus point will be on using the physics based model tovalidate the capacitor data under different thermal conditionsand capacitor geometry. This will greatly enhance the qual-
7
First European Conference of the Prognostics and Health Management Society, 2012
162
European Conference of the Prognostics and Health Management Society, 2012
ity and effectiveness of the degradation models in prognos-tics, where the operating and environmental conditions alongwith the structural conditions are also accounted for towardsdegradation dynamics.
NOMENCLATURE
εR relative dielectric constantεO permitivity of free spaceto oxide thicknessV dispersion volume at time tVO initial electrolyte volumejeo evaporation rate (mg min−1 area−1)t time in minutesρE electrolyte resistivityPE correlation factor related to electrolyte
spacer porosity and average liquid pathway.rc radius of capacitor capsulehc height of capacitor capsuleVpaper volume of paper.L length of the anode oxide surfaceH height of the anode oxide surfaceAsurface effective oxide surface area (L x H)we volume of ethyl glycol moleculeVc total capacitor capsule volumedA thickness of anode strip,dC thickness of cathode stripC capacitanceM1 electrical lumped parameter modelM2 updated lumped parameter modelD1 capacitance degradation modelD2 ESR degradation model
REFERENCES
60068-1, I. (1988). Environmental testing, Part 1: Generaland guidance. IEC Standards.
60384-4-1, I. (2007). Fixed capacitors for use in electronicequipment. IEC Standards.
Balaban, E., Saxena, A., Narasimhan, S., Roychoudhury,I., Goebel, K., & Koopmans, M. (2010). AirborneElectro-Mechanical Actuator Test Stand for Develop-ment of Prognostic Health Management Systems. Pro-ceedings of Annual Conference of the PHM Society2010, October 10-16, Portland, OR.
Bengt, A. (1995). Electrolytic Capacitors Theory and Appli-cations. RIFA Electrolytic Capacitors.
Bharadwaj, R., Kulkarni, C., Biswas, G., & Kim, K. (2010,April). Model-Based Avionics Systems Fault Simu-lation and Detection. American Institute of Aeronau-tics and Astronautics, AIAA Infotech@Aerospace 2010,AIAA-2010-3328.
Biologic. (2010). Application note 14-Zfit and equivalentelectrical circuits [Computer software manual].
Buiatti, G., Martin-Ramos, J., Garcia, C., Amaral, A., & Car-
doso, A. (2010). An Online and Noninvasive Tech-nique for the Condition Monitoring of Capacitors inBoost Converters. IEEE Transactions on Instrumen-tation and Measurement, 59, 2134 - 2143.
Celaya, J., Kulkarni, C., Biswas, G., & Goebel, K. (2011a). AModel-based Prognostics Methodology for ElectrolyticCapacitors Based on Electrical Overstress AcceleratedAging. Proceedings of Annual Conference of the PHMSociety, September 25-29, Montreal, Canada.
Celaya, J., Kulkarni, C., Biswas, G., & Goebel, K. (2011b).Towards Prognostic of Electrolytic Capacitors. Ameri-can Institute of Aeronautics and Astronautics, AIAA In-fotech@Aerospace 2011, March 2011, St. Louis, Mis-souri.
Celaya, J. R., Wysocki, P., Vashchenko, V., Saha, S., &Goebel, K. (2010). Accelerated aging system for prog-nostics of power semiconductor devices. In IEEE AU-TOTESTCON, 2010 (p. 1-6). Orlando, FL.
Eliasson, L. (2007, October - November). Aluminium Elec-trolytic Capacitor’s performance in Very High RippleCurrent and Temperature Applications. CARTS Europe2007 Symposium, Spain.
Fife, J. (2006, Aug). Wet Electrolytic Capacitors (Patent No:7,099 No. 1). Myrtle Beach, SC: AVX Corporation.
Gasperi, M. L. (1996, October). Life Prediction Model forAluminum Electrolytic Capacitors. 31st Annual Meet-ing of the IEEE-IAS, 4(1), 1347-1351.
Goebel, K., Saha, B., & Saxena, A. (2008). A Comparision ofThree Data-Driven Tes for Prognostics. 62nd Meetingof the Society For Machinery Failure Prevention Tech-nology (MFPT) Virginia Beach, VA, 119 - 131.
Gomez-Aleixandre, C., Albella, J. M., & Martnez-duart,J. M. (1986). Pressure build-up in aluminum elec-trolytic capacitors under stressed voltage conditions.Journal of Applied Electrochemistry, Volume 16, Num-ber 1, 109 - 115.
Goodman, D., Hofmeister, J., & Judkins, J. (2007). Elec-tronic prognostics for switched mode power supplies.Microelectronics Reliability, 47(12), 1902-1906.
Gu, J., Azarian, M. H., & Pecht, M. G. (2008). Fail-ure Prognostics of Multilayer Ceramic Capacitors inTemperature-Humidity-Bias Conditions. InternationalConference on Prognostics and Health Management.
Gu, J., & Pecht, M. (2008). Prognostics and Health Manage-ment Using Physics-of-Failure. 54th Annual Reliabilityand Maintainability Symposium (RAMS).
Hayatee, F. G. (1975). Heat Dissipation and Ripple Cur-rent rating in Electrolytic Capacitors. Electrocompo-nent Science and Technology, 2, 109-114.
Ikonopisov, S. (1977). Theory of electrical breakdown duringformation of barrier anodic films. Electrochimica Acta,,Volume 22, Issue 10, 1077 - 1082.
Imam, A., Habetler, T., Harley, R., & Divan, D. (2005,June). Condition Monitoring of Electrolytic Capaci-
8
First European Conference of the Prognostics and Health Management Society, 2012
163
European Conference of the Prognostics and Health Management Society, 2012
tor in Power Electronic Circuits using Adaptive FilterModeling. IEEE 36th Power Electronics SpecialistsConference, 2005. PESC ’05., 601-607.
Kulkarni, C., Biswas, G., Celaya, J., & Goebel, K. (2011a).A Case Study for Capacitor Prognostics under Accel-erated Degradation. IEEE 2011 Workshop on Accel-erated Stress Testing & Reliability (ASTR), September28-30, San Francisco, CA.
Kulkarni, C., Biswas, G., Celaya, J., & Goebel, K. (2011b).Prognostic Techniques for Capacitor Degradation andHealth Monitoring. The Maintenance& ReliabilityConference, MARCON 2011, Knoxville, TN.
Kulkarni, C., Biswas, G., & Koutsoukos, X. (2009). A prog-nosis case study for electrolytic capacitor degradationin DC-DC converters. Proceedings of Annual Confer-ence of the PHM Society, September 27 October 1, SanDiego, CA.
Kulkarni, C., Celaya, J., Biswas, G., & Goebel, K. (2011).Prognostic Modeling and Experimental Techniques forElectrolytic Capacitor Health Monitoring. The 8th In-ternational Workshop on Structural Health Monitoring2011 (IWSHM) , September 13-15, Stanford University,Stanford, CA.
Kulkarni, C., Celaya, J., Biswas, G., & Goebel, K. (2012).Prognostic and Experimental Techniques for Elec-trolytic Capacitor Health Monitoring. The Annual Re-liability and Maintainability Symposium (RAMS), Jan-uary 23-36, Reno, Nevada..
Lahyani, A., Venet, P., Grellet, G., & Viverge, P. (1998, Nov).Failure prediction of electrolytic capacitors during op-eration of a switchmode power supply. IEEE Transac-tions on Power Electronics, 13, 1199-1207.
Nie, L., Azarian, M., Keimasi, M., & Pecht, M. (2007). Prog-nostics of ceramic capacitor temperature humidity biasreliability using mahalanobis distance. Circuit World,33(3), 21 - 28.
Orsagh, R., & et’al. (2006, March). Prognostic HealthManagement for Avionics System Power Supplies.Aerospace Conference, 2006 IEEE, 1-7.
Roederstein, V. (2007). Aluminum Capacitors - General In-formation. Document - 25001 January 2007.
Rusdi, M., Moroi, Y., Nakahara, H., & Shibata, O. (2005).Evaporation from Water Ethylene Glycol Liquid Mix-ture. Langmuir - American Chemical Society, 21 (16),7308 - 7310.
Saha, B., Celaya, J. R., Wysocki, P. F., & Goebel, K. F.(2009). Towards prognostics for electronics compo-nents. In IEEE Aerospace conference 2009 (p. 1-7).Big Sky, MT.
Saxena, A., Celaya, J., Balaban, E., Goebel, K., Saha, B.,Saha, S., et al. (2008). Metrics for evaluating perfor-mance of prognostic techniques. In International Con-
ference on Prognostics and Health Management 2008.Vohnout, S., Kozak, M., Goodman, D., Harris, K., & Jud-
kins, J. (2008). Electronic Prognostics System Imple-mentation on Power Actuator Components. AerospaceConference, 2008 IEEE, 1 - 11.
Wereszczak, A., Breder, K., & Ferber, M. K. (1998). FailureProbability Prediction of Dielectric Ceramics in Mul-tilayer Capacitors. Annual Meeting of the AmericanCeramic Society, Cincinnati, OH (United States).
Wit, H. D., & Crevecoeur, C. (1974). The dielectric break-down of anodic aluminum oxide. Physics Letters A,Volume 50, Issue 5, 365 - 366.
Chetan S. Kulkarni is a Research Assistant at ISIS, Van-derbilt University. He received the M.S. degree in EECSfrom Vanderbilt University, Nashville, TN, in 2009, wherehe is currently a Ph.D candidate and received a B. E. degreein Electronics and Electrical Engineering from University ofPune, India in 2002.
Jose R. Celaya is a research scientist with SGT Inc. atthe Prognostics Center of Excellence, NASA Ames ResearchCenter. He received a Ph.D. degree in Decision Sciences andEngineering Systems in 2008, a M. E. degree in OperationsResearch and Statistics in 2008, a M. S. degree in ElectricalEngineering in 2003, all from Rensselaer Polytechnic Insti-tute, Troy New York; and a B. S. in Cybernetics Engineeringin 2001 from CETYS University, Mexico.
Kai Goebel received the degree of Diplom-Ingenieur fromthe Technische Universitt Mnchen, Germany in 1990. He re-ceived the M.S. and Ph.D. from the University of Californiaat Berkeley in 1993 and 1996, respectively. Dr. Goebel isa senior scientist at NASA Ames Research Center where heleads the Diagnostics and Prognostics groups in the Intelli-gent Systems division. In addition, he directs the PrognosticsCenter of Excellence and he is the technical lead for Prog-nostics and Decision Making of NASAs System-wide Safetyand Assurance Technologies Program. He worked at Gen-eral Electrics Corporate Research Center in Niskayuna, NYfrom 1997 to 2006 as a senior research scientist. He has car-ried out applied research in the areas of artificial intelligence,soft computing, and information fusion. His research interestlies in advancing these techniques for real time monitoring,diagnostics, and prognostics. He holds 15 patents and haspublished more than 200 papers in the area of systems healthmanagement.
Gautam Biswas received the Ph.D. degree in computer sci-ence from Michigan State University, East Lansing. He is aProfessor of Computer Science and Computer Engineering inthe Department of Electrical Engineering and Computer Sci-ence, Vanderbilt University, Nashville, TN.
9
First European Conference of the Prognostics and Health Management Society, 2012
164
Prediction of Fatigue Crack Growth in Airframe Structures
Jindřich Finda1, Andrew Vechart
2 and Radek Hédl
3
1,3Honeywell International s.r.o., Brno, Tuřanka 100, 627 00, Czech Republic
2Honeywell International Inc., 1985 Douglas Drive North (M/S MN10-112B), Golden Valley, MN 55422, USA
ABSTRACT
The paper describes the general proposal, function and
performance of a prognostics system for fatigue crack
growth in airframe structures. Prognostic capabilities are
important to a superior Structural Health Monitoring System
(SHM). An aim of the prognosis is estimation / prediction of
a system / subsystem / structure remaining life, i.e. the time
at which damage (crack, corrosion, wear, delamination,
disbonding, etc.) will result in a failure of the considered
element.
1. INTRODUCTION
Fatigue damage and its consequences are the most serious
structural design and maintenance issues that have to be
addressed. Several philosophies of how to decrease
consequences of a fatigue hazard have been developed and
applied. Serious aircraft accidents due to fatigue have
contributed to this development and have started research
efforts in this area. Two main philosophies for aircraft
structure design are used nowadays:
Safe Life - an extremely low level of risk is accepted
through a combination of testing and analysis that the part
will ever form a detectable crack due to fatigue during the
service life of the structure.
Damage Tolerance - structure has the ability to sustain
defects safely until the defect is detected and repaired.
Structural Health Monitoring (SHM) - represents the next
advanced step in structural damage monitoring and
maintenance planning. An occurrence of structural damage
is monitored by a sophisticated automated system. Its usage
does not demand additional time from inspections and
qualified personal. A significant part of a SHM system and
its application in aerospace is prognostics. It shifts a
structural maintenance program to an advanced level and
brings significant benefits for an aircraft operator like
efficient maintenance planning, effective aircraft usage,
service cost decrease, safety increase, etc. Its application
opens a new approach to aircraft design and brings a new
advanced philosophy of structural lifetime estimation.
2. ENTIS PROJECT DESCRIPTION
This paper describes our approach to fatigue damage
prognostics development. It is based on results from the
ENTIS project supported by the Ministry of Industry and
Trade of the Czech Republic. The main goal of this project
is SHM system development, particularly its real application
form, capabilities, conditions and issues definition.
Experimental testing is a basic part of this project. Fatigue
tests of aircraft structure parts have been done and structural
damage has been monitored by an ultrasonic method. All
structural specimens are parts of an L-410 UVP-E airplane
(Figure 1), which is an all-metal high-wing monoplane
powered by two turboprop engines. The airplane is certified
in the commuter category in accordance with (FAR) PART
23 requirements. All structural specimens are considered
critical parts of the aircraft structure in the sense of fatigue
damage. The fatigue test arrangement is shown in Figure 2.
Figure 1. Aircraft Industries a. s. L-410 UVP-E airplane
_____________________
Jindřich Finda et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License,
which permits unrestricted use, distribution, and reproduction in any
medium, provided the original author and source are credited.
First European Conference of the Prognostics and Health Management Society, 2012
165
European Conference of Prognostics and Health Management Society 2012
2
Figure 2. Wing Spar Fatigue Test
For the purpose of generation and registration of ultrasonic
waves shear plate PZT actuators were used (Figure 3).
Characteristics of the particular PZT actuators from Noliac
used are shown in (Table 1). The actuators are characterized
by their small dimensions, low weight, and low cost. This
allows installation of large and non-expensive sensor arrays
on an airframe with very small impact on the structure
performance and aerodynamic properties. Moreover, the
small dimensions of the PZT elements are suitable for
integration of the sensor array on a flexible strip, which
significantly accelerates the process of sensor array
installation on the monitored structure.
Figure 3. Wing Spar Fatigue Test
Type Length
[mm]
Width
[mm]
Height
[mm]
Maximum
voltage
[V]
Free
stroke
[µm]
Capacitance
[pF]
CSAP02 5 5 0.5 +/-320 1.5 830
Table 1. PZT Actuator Parameters
Figure 4. Block scheme of the crack growth monitoring
system
The block scheme of the advanced signal processing for the
automated monitoring of fatigue damage of the particular
aircraft structural part is in Figure 4. A sparse sensor array
enclosing the monitored area (Hot Spot) is used for
collection of the data used for evaluation of the actual state
of the structure. Therefore, the automated defect detection /
sizing is based on evaluation of changes in direct signal
paths, i.e. signals between individual pairs of PZT actuators
where one PZT actuator works as source of the ultrasonic
wave and the second one as sensor. First, Signal Difference
Coefficients (SDCs) for individual paths are calculated
evaluating differences between baseline and actual signals
measured on the monitored structure. These SDCs represent
a Damage Index (DI), which gives information on the extent
of the structure damage. The defect occurrence is indicated
and defect size is estimated using an Artificial Neural
Network, which transforms a feature vector capturing
significant features (DIs) of an identified defect to
demanded parameters (defect occurrence, defect size),
which are used as inputs of prognostics algorithm.
PZT Sensors
First European Conference of the Prognostics and Health Management Society, 2012
166
European Conference of Prognostics and Health Management Society 2012
3
3. FATIGUE CRACK GROWTH PROGNOSTICS
The fatigue crack growth prognosis is used for RUL
prediction. The RUL is defined by the crack length reaching
its critical size limit. The concept of the crack growth
prognostic algorithm is shown in (Figure 4). Input of the
prognostic algorithm consists of the crack length observed
during a particular inspection, i.e. measured by the SHM
system, and the typical loading sequence. Several
algorithms for the crack growth calculation can be found in
literature (Beden & Abdullah & Ariffin, 2009). Those
algorithms are based on various approaches: fracture
mechanics & empirical models (Beden & Abdullah &
Ariffin, 2009), or data based models (Forman, R. G. &
Shivakumar, V & Cardinal J.W. & Wiliams, L.C. &
McKeigham, P.C. 2005). Suitability of these approaches for
a particular application depends on the actual type of
loading, type of structure and other boundary conditions and
available inputs for tuning of the crack growth model.
Our developed prognostic system uses the NASGRO
equation as the crack growth model (NASGRO Reference
Manual, version 4.2), Augustin (2009). This approach was
selected for the following reasons: (1) This equation is
widely used in aerospace, (2) All inputs required for the
crack growth model related to duralumin aircraft structure
are available in literature, (3) Influence of variable
amplitude loading is accounted for in the NASGRO model,
(4) This algorithm solves the crack growth in all three
phases of the crack propagation (crack initiation, stable
crack growth, unstable crack growth).
The SHM system is focused on the so called Principle
Structure Element (PSE). PSE’s are those elements of a
primary structure which contribute significantly to carrying
flight, ground, and pressurization loads, and whose failure
could result in catastrophic failure of the airplane. Sensors
are installed on hot-spots in order to provide information on
the actual status of the structure integrity, (i.e. actual length
of the fatigue crack). Each hot-spot is treated separately,
(i.e. prognosis of the crack growth for particular hot-spot is
done without accounting for the effect of presence of other
cracks out of the hot-spot). However, each hot-spot may
contain multiple cracks.
3.1. Input Crack Length
The crack length observed during a particular inspection is
used as an initial crack length for the prognostic algorithm.
Evaluation of the crack length is done using a feature vector,
which consists of DIs. The vector is applied on an input of
an Artificial Neural Network as was described above. The
prognostic algorithm propagates the initial crack length into
the future using a typical loading sequence. Thus, the
prognosis is done for each inspection, i.e. for each crack
length observed.
3.2. Typical Loading Sequence and Flight Loading
Spectrum
The type of the wing flange specimen loading is the same as
the real loading on a real aircraft wing flange. It is a
combination of a bending moment and an axis force.
The loading sequence represents a series of loading cycles
(Figure 5) affecting the structure. Duration of a loading
cycle is constant for the whole sequence, i.e. 1/3s. Each
cycle is described by its maximal and minimal stress levels
[σmin, σmax]. The maximal and minimal stress levels are
expressed as multiples of a nominal stress σ0. Thus, we have
a pair of numbers nmin, nmax so called load factors for a
single loading cycle.
A typical flight spectrum for a particular aircraft (Figure 6)
is used in order to define a typical flight loading sequence.
The typical flight spectrum was defined according to FAA
AC 23-13A. The total loading spectrum during the flight
composes two loading spectra: wind gust and maneuvers.
The loading sequence can be derived from the typical flight
spectrum in various ways. Most often the block loading
sequence or random loading sequence is used:
Block loading sequence – loading cycles of the same
amplitude are organized in blocks, which consist of a
number of loading cycles.
Random loading sequence – loading cycles, which have
various amplitudes, are randomly organized in the loading
sequence (Figure 7).
Figure 5. Typical Loading Cycle
First European Conference of the Prognostics and Health Management Society, 2012
167
European Conference of Prognostics and Health Management Society 2012
4
Figure 6. Typical Flight Spectrum for L-410 UVP-E
Figure 7. Random Loading Sequence for One Flight
3.3. Stress Intensity Factor
The calculation of the stress intensity factor (SIF) at the
crack tip for a nominal load is based on a finite element
analysis (FEA), which requires knowledge of the structure
and crack geometry. The FEA provides stress energy release
rates for the tip from where stress intensity factor Kσ0(N)
can be calculated. The FEA analysis is time consuming and
too complex to include in an online system. Online FEA
calculation of the stress intensity factor Kσ0(N) is replaced
by a lookup table for our purpose.
3.4. Crack Growth Equation
Calculation of the crack increment for a particular load
cycle is done using the NASGRO equation of fracture
mechanics Eq. (1).
(1)
where C, f, ΔKth, Kmax, Kc, p and q are model parameters
given by the structure material and geometry.
4. PROGNOSTIC ALGORITHM PERFORMANCE
Two sets of experimental data from laboratory fatigue tests
of wing flanges (Figure 2) were used as inputs for the
prognosis algorithm performance assessment. In both tests,
a two-tip crack 2 x 1.27 mm long was initiated at the rivet
hole. One tip pointed to edge of the flange – external crack,
and the other one to the flange axis of symmetry - internal
crack. The intention of the experiment was to 1) evaluate
performance of the crack measurement system and 2) obtain
fatigue crack growth data. Ideally the only two major cracks
near the measurement site would have been those initiated
intentionally. This was the case for the first set of
experimental data. However, during the second experiment,
several other cracks formed within the test article,
contaminating the experimental data. Nonetheless, the
performance of the fatigue crack prognosis model is
compared with both sets of experimental data.
Figure 8. Fatigue Crack Prognosis with First Set of
Experimental Data
First European Conference of the Prognostics and Health Management Society, 2012
168
European Conference of Prognostics and Health Management Society 2012
5
Figure 9. Predicted Times to Internal Crack Length of 38
mm (first experimental data)
Figure 8 and Figure 9 show the outcome of the fatigue crack
growth prognostic algorithm applied to the first set of
experimental data. In Figure 8, the dashed data lines
represent the crack lengths measured by fractography at
specific times during the test for the internal and external
cracks. The solid lines indicate the predicted crack growths
where the crack growth prognosis was initiated at each
fractography measurement of the crack length. Each
prognosis proceeds no farther than 20,000 flights into the
future (where each flight consists of approximately 23
loading cycles).
Four uncertainties are considered in our work: Loading
uncertainty, SIF uncertainty, Prognostics model uncertainty
and Measured crack length uncertainty.
Figure 9 shows the predicted times the crack will reach the
critical length. The solid curve represents the actual time
predicted by the algorithm. The dashed curve includes an
offset of this prediction accounting for minimum
uncertainties due to loading and SIF uncertainties:
Loading uncertainty – One of sources of uncertainty in
crack growth prognosis is the assumed loading sequence
(random loading sequence drawn from the typical loading
spectrum for this type of aircraft). For several initial crack
lengths, several different crack prognoses were run, each
using a random ordering of the selected loading sequence.
The upper end of the 95% confidence interval on the
standard deviation of the times from these prognoses was
then calculated.
SIF uncertainty – A key parameter in fatigue crack growth
models is the crack Stress Intensity Factor (SIF). For
purposes of this project, SIFs are evaluated using a
Boundary Element Analysis software package. Crack
geometries are entered into the software, and post-
processing yields an estimate of the stress intensity factor.
The time to setup and execute this stress analysis is too long
to perform for each step of the fatigue crack prognosis.
Therefore, a lookup table has been generated for selected
crack lengths (minimum, medium, and maximum). The
lookup table has been generated under the assumption that
a single stress may be used to determine a baseline SIF, and
this baseline can then be scaled by a load factor to calculate
an actual SIF as a function of crack length and loading
condition. The SIF for particular crack length is then
calculated using the values tabulated in the look-up table
and an interpolation technique, which is the source of the
uncertainty in the SIF estimation.
Prognostics model and measured crack length uncertainties
are not described in this paper.
The solid horizontal line indicates the time to reach this
crack length as indicated by the experimental data. In the
figure it can be seen that as the time until the internal crack
reaches the critical length gets smaller, the identified
uncertainties do not account for the error between the
prediction and the experimental results.
Figure 10 shows the prognostic algorithm applied to the
second set of experimental data. As mentioned before, the
second set of experimental data was contaminated by
additional unintended cracks propagating during the test.
Predicted times to reach the critical crack length are
presented in Figure 11.
Figure 10. Fatigue Crack Prognosis with Second Set of
Experimental Data
First European Conference of the Prognostics and Health Management Society, 2012
169
European Conference of Prognostics and Health Management Society 2012
6
Figure 11. Predicted Times to Internal Crack Length of 38
mm (Second Experimental Data)
5. CONCLUSIONS
The prognostic system allows the possibility of fatigue
damage growth prediction and mitigates an issue of a
corrective maintenance planning. Our prognostic system
design is based on the traditional method (NASGRO
equation) used for crack propagation modeling for damage
tolerance relating analyses. Prognosis of simultaneous
propagation of multiple cracks is possible. In this case,
multi-dimensional lookup tables for SIF estimation have to
be used in order to account for interaction between
individual cracks. A connection of the SHM and
Prognostics system brings a novel advanced capability of an
interactive fleet management and prognostics regarding
fatigue damage growth. It shows a new dimension of
maintenance planning and organization of all maintenance
tasks are done in real time that is estimated by aircraft
operators regarding their requirements and minimal costs.
The accuracy of the crack growth prediction is influenced
by several parameter uncertainties. This was demonstrated
by the prognostics applied on two fatigue test results. Those
parameters uncertainties (loading, SIFs, crack size
estimation, etc.) have to be considered. The prognostics
output could be influenced by addition boundary conditions
(additional cracks – flange fatigue test 2). In this case
prognostics results do not follow the real crack propagation
curve exactly. A solution is monitoring of changes of
boundary conditions and adjustment of prognostics input
parameters with regard to these changes.
ACKNOWLEDGEMENT
The presented work has been supported by the Ministry of
Industry and Trade of Czech Republic by grant project no.
FR-TI1//274 under framework program TIP.
NOMENCLATURE
Da Crack size increment
DI Damage Index
f Opening Function
FEA Finite Element Analyses
Kc Fracture Toughness
Kσ0 Stress intensity factor
σmin Minimal stress level
σmax Maximal stress level
nmin Cycle minimal load factor
nmin Cycle maximal load factor
PSE Principle Structure Element
PZT Lead Zirconate Titanate
RUL Remaining Usage Life
SDC Signal Difference Coefficient
SHM Structure Health Monitoring
SIF Stress Intensity Factor
REFERENCES
Augustin, P. (2009). Simulation of Fatigue Crack Growth in
the High Speed Machined Panel under the Constatnt
Amplitude and Spectrum Loading. 25th
ICAF
Symposium. Rotterdam.
Beden, S. M. & Abdullah, S. & Ariffin, A. K. (2009).
Review of fatigue Crack Propagation Models for
Metallic Components. European Journal of Scientific
Research. ISSN 1450-216X Vol.28 No.3.
Federal Aviation Regulations (FAR). PART 23
Airworthiness standards: Normal, utility, acrobatic, and
commuter category airplanes.
Federal Aviation Advisory Circular (AC). AC 23-13A –
Fatigue, Fail-Safe, and Damage Tolerance Evaluation
of Metallic Structure for Normal, Utility, Acrobatic,
and Commuter Category Airplanes.
Forman, R. G. & Shivakumar, V & Cardinal J.W. &
Wiliams, L.C. & McKeigham, P.C. (2005), Fatigue
Crack Growth Database for Damage Tolerance
Analysis, DOT/FAA/AR-05/15
NASGRO Reference Manual, version 4.2, NASA Johnson
Space Center, Southwest Research Institute, San
Antonio.
First European Conference of the Prognostics and Health Management Society, 2012
170
European Conference of Prognostics and Health Management Society 2012
7
BIOGRAPHIES
Jindrich Finda (March 28th, 1980)
earned his Master of Science in Aircraft
Design from Brno University of
Technology, Faculty of Mechanical
Engineering, Institute of Aerospace
Engineering in 2003 and his PhD. in
Methods for Determination of
Maintenance Cycles and Procedures for Airplanes/ Airplane
Assemblies from Institute of Aerospace Engineering in
2009. Jindrich Finda works as a Scientist II R&D. His work
is aimed on the SHM system development, (developing
algorithms for advanced ultrasonic signal / image
processing, and algorithms for automated defect detection,
localization, size evaluation and prognosis of the defect
growth, SHM integration into aircraft maintenance plan).
Andrew Vechart earned his Master of
Science in Computation for Design and
Optimization from Massachusetts Institute
of Technology in 2011 and his Bachelor of
Science in Mechanical Engineering and
Physics from the University of Wisconsin
– Milwaukee in 2009. He has been a R&D
scientist with Honeywell focusing on
vehicle health management projects since 2011. At
Honeywell, he performed prognostic algorithm development
for a structural health monitoring (SHM) application. He is
also program manager and project engineer for a Honeywell
program to update existing and design new firmware and
software for an FPGA application. His interests include
structural health management, embedded systems, and
scientific computing.
Radek Hedl (November 19th
, 1973) earned
his Master of Science in Cybernetics,
Automation and Measurement from
Department of Biomedical Engineering,
Faculty of Electrical Engineering and
Computer Science, Brno University of
Technology and his PhD. in Cybernetics
and Computer Science from Department of
Biomedical Engineering, Faculty of Electrical Engineering
and Computer Science, Brno University of Technology.
Radek Hedl works as a Sr. R&D Scientist in the Mechanical
& Simulation Technologies group. In particular, he is a
leader of CBM/SHM sub-group. His responsibilities include
development of the resource and technology capabilities of
the CBM/SHM group in Brno, involvement in definition of
long term strategy technology roadmaps in the CBM/SHM
area, and leading R&D projects. His R&D activities include
developing algorithms for advanced ultrasonic signal /
image processing, and algorithms for automated defect
detection, localization, size evaluation and prognosis of the
defect growth.
First European Conference of the Prognostics and Health Management Society, 2012
171
Simulation Framework and Certification Guidance for Condition Monitoring and Prognostic Health Management
Dipl.-Ing. Matthias Buderath1
Partha Pratim Adhikari2
1, Cassidian, Manching, 85104, Germany [email protected]
2, Cassidian, CAIE, Bangalore, 560 016, India [email protected]
ABSTRACT
The most prominent challenges to the successful qualification of Integrated System Health Monitoring (ISHM) systems are appropriate technology development processes and Verification & Validation (V&V) methods towards certification. This paper outlines a survey of recent ISHM programs in diverse industrial sectors across the globe, offers guideline towards ISHM development at each Technology Readiness Level (TRL), and sets forth a V&V process and certification roadmap. This paper provides insight into Cassidian’s ISHM Simulation framework and emphasizes the relevance of this framework to an effective V&V solution of ISHM.
1. INTRODUCTION
With growing financial uncertainty, air vehicle operators (both commercial and military) are under tremendous pressure to reduce operational and support costs. It is accepted across the aerospace industry that ISHM is a potentially valuable strategy for the manufacture and management of vehicle platforms. At the same time, ISHM has not yet fully matured as a technology in several key functional areas. Research and development to address this shortfall is occurring across both the automobile and aerospace industries. Although technologies related to Built–In-Test (BIT) and diagnostics have advanced greatly and research into enhanced diagnostics are progressing very fast, prognostics technology for all types of aircraft sub-systems are in a very nascent stage. 1
M. Buderath et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Validation & Verification (V&V) method leading to the qualification and certification of ISHM is a key area of development. Although there has been considerable effort in this direction, ISHM system at the aircraft level is yet to be certified. Certification agencies (EASA, FAA, SAE, etc.) have yet to establish comprehensive certification regulation for Integrated System Health Monitoring system. Kevin R. Wheeler et al. (2010) contribute to an extensive survey of recent ISHM programs and mention that vast differences in user objectives with regard to engineering development is the major barrier for successful V&V. The paper identifies in detail the objectives and associated metrics across operational, regulatory and engineering domains for diagnosis and prognosis algorithms and systems. James E. Dzakowic et al. (2004) introduce a methodology for verifying and validating the capabilities of detection, diagnostic and prognostic algorithms through an on-line metrics based evaluation. Martin S. Feather (2005) mentions in his publication that state-of-the-practice V&V and certification techniques will not suffice for emerging forms of ISHM systems. However, a number of maturing software engineering assurance technologies show particular promise for addressing these ISHM V&V challenges. Dimitry Gorinevsky et al. (2010) describe the importance of a NASA-led effort in open system IVHM architecture. Detailed functional decompositions of IVHM systems with respect to criticality, on/off board operation and development cost are presented and certification standards
First European Conference of the Prognostics and Health Management Society, 2012
172
European Conference of Prognostics and Health Management Society 2012
2
are mapped accordingly. This paper also addresses the current NASA IVHM test bed along with development and deployment steps corresponding to increasing TRL. The FAA’s advisory circular (AC), AC 29-2C MG-15, provides guidance in achieving airworthiness approval for rotorcraft Health and Usage Monitoring System (HUMS) installations. It also outlines the process of credit validation, and Instructions for Continued Airworthiness (ICA) for the full range of HUMS applications. Brian D Larder et al. (2007) converted the text of AC 29-2C MG-15 into a flow chart. His intention was to define the generic end-to-end certification process for HUMS CBM credit. Further, he sought to identify the relationships and interactions between different elements of the certification process that are contained in the three separate sections of the AC ( installation, credit validation, and Instructions for Continued Airworthiness). This paper also mentions that HUMS have achieved very few credits, and that the material in the AC is largely untested. However HUMS in-service experience shows that the potential for future credits does exist. ADS-79B HDBK (2011) describes the US Army‘s Condition Based Maintenance (CBM) system and defines the overall guidance necessary to achieve CBM goals for Army aircraft systems and Unmanned Aircraft Systems (UAS). Praneet Menon et al. (2011) published a paper, which summarizes the work of a Vertical Lift Consortium Industry Team to provide the detailed guidance for the Verification and Validation (V&V) of CBM Maintenance Credits. SAE ARP 5783 summarises the key metrics for evaluating diagnostic algorithms along with expression of these matrices. As per the SAE news letter (2010), an Integrated Vehicle Health Management (IVHM) Steering Group has been formed to explore the need for standardization in order to drive IVHM technology towards the following objective.
• the development of a single definition and taxonomy of IVHM to be used by the aerospace and IVHM communities
• the identification of how and where IVHM could be implemented
• the development of a roadmap for IVHM standards,
• and the identification of future IVHM technological and regulatory needs
Deployment of ISHM in an aircraft and the resulting qualification process demands a huge investment. Verification and validation of these ISHM technologies is
an important step in building confidence, both qualitatively and quantitatively. Practically, the cost of correcting an error after fielding an ISHM system is dramatically greater then during the testing phase, thus highlighting the need for appropriate verification and validation techniques. Certification considerations must be addressed during the very early stages of technology development in order to successfully meet any significant qualification goals. Appropriate guidelines and strategies should be followed in ISHM technology development to ensure successful certification within the desired time frame. Additionally, trade studies in the selection of V&V platforms reduce the eventual cost of V&V processes. This paper focuses on development of such guidelines for the V&V process while emphasizing the relevance of ISHM simulation frameworks, and a well devised certification roadmap.
2. CERTIFICATION ASPECTS OF ISHM
2.1. Evolution of ISHM
Maintenance credits are acquired when an ISHM system can replace the existing industry standard maintenance for a given component or complete aircraft system and this enhances availability, maintainability and mission capabilities of aircraft. To reach this level, evolution of ISHM development has to pass through effective process for technology maturation, development, verification, validation, qualification and finally certification. Figure 1 illustrates the evolution phases of an ISHM system, which span maturation (concept refinement and technology development), development, production, installation, control introduction to service, benefit/credit validation, certification phases and continued airworthiness. The certification phases involve both the system developer and the regulator; they are initiated through an application made by the system developer to the appropriate regulatory authority; they are often performed in parallel to the various evolution phases.
Figure 1. Evolution of Aircraft Product including ISHM
First European Conference of the Prognostics and Health Management Society, 2012
173
Figure 2. Guidance for Technology Maturation & Development
2.2. Technology Maturation
After the determination of the potential functionality and benefits of ISHM, maturation efforts are initiated. Usually, the maturation phase starts before the development and certification phases, and can overlap them. The maturation efforts are often performed through Research and Development (R&D) programmes guided by technology and product roadmaps: efforts are allocated to develop sensing technologies, algorithms and software for ISHM, and to enhance the performance of ISHM in terms of increased accuracy, reduced weight, improved reliability, advanced communication and efficient data transfer. Technology gaps and risks are identified and efforts are allocated to fill the gaps and to mitigate the risks. During the maturation phase, the potential benefits and credits of ISHM are re-assessed and validation evidence is gathered. Efforts can also be allocated to develop and test ISHM prototypes, and to develop efficient production processes and reliable installation techniques. The Figure 2 defines the activities involved during technology maturation and development of ISHM system.
2.3. Development
The main development phases of a system, which can involve iterations through the following activities: determination of detailed system requirements, determination of the criticality levels and associated integrity requirements, system design, system test and evaluation, system integration, identification methodologies for credit validation, etc.
2.4. Guideline for V&V and Certification
Brian D Larder et al. depicted in form of flow diagram three important steps viz. installation, credit validation and Instructions for Continued Airworthiness (ICA) of HUMS certification as per FAA’s advisory circular. In the similar line, Praneet Menon et al. (2011) provided in terms of flow diagram detailed guidance for verification and validation of CBM Maintenance Credits. This paper attempts to combine the both concepts and depict very prominently how development process, V&V, certification and qualification are linked each other in terms of interdependency and phases of verification & validation maturity towards successful maintenance credit.
2.4.1. Certification for Installation
This consists of the following steps: • Check criticality versus integrity • Mitigating Actions • Airborne Equipment Installation • Ground base Equipment Installation • Credit plan approval
If any credit is to be gained, the general guidelines for determination of criticality levels will be either Minor, Major, or Hazardous/Severe-Major. They will be in agreement with the resulting effect of the end-to-end criticality assessment. A mitigating action is an autonomous and continuing compensating factor which may modify the level of qualification associated with certification of an ISHM application. These actions are often performed as part of
First European Conference of the Prognostics and Health Management Society, 2012
174
European Conference of Prognostics and Health Management Society 2012
4
continued airworthiness considerations and are also an integral part of the certification. The overall installation considerations for airborne equipment should include, as a minimum, supply of electrical power, environmental conditions, system non-interference, and human factors if operations are affected along with considering environmental qualification (RTCA/DO-160/ED 14) and software development standard (RTCA/DO-178/ED-12). Since the ground based equipment may be an important part of the process for determination of intervention actions, its integrity and accuracy requirements must be the same as any other part of the ISHM process. The independent means of verification activity is required due to the use of COTS software. If the integrity assessment (IA) has mitigation spelled out for all possible functional failures of the algorithm, then one can proceed with the next V&V steps, i.e. establishing the V&V criteria and getting the V&V plan approved by the aviation authority. V&V criteria are driven by certification basis. The Certification Basis, summarised for ISHM in the Table 1, is the listing of all requirements from regulatory authorities or related advisory circulars which will ensure qualification of the system for airworthiness and to achieve maintenance credit in the context of ISHM. Generally certification basis is derived from Certification Specification (CS), Technical Standard order (TSO), along with the recent compliance recommendations (AMC,..), amendments and interpretations which are to be negotiated between certification coordinator (CC) and authority.
Table 1: Certification Basis for ISHM
2.4.2. V&V for Maintenance Credit
This can be done after the installation certification has been completed, however it is highly recommended to start this well before the installation certification is complete. Since the description of application and intended credit of the CBM process has already been defined it is now necessary to prove that the underlying physics of the monitored equipment and it's failures has been understood. The verification of the credit methodology is taken up. Upon completion of the verification steps, it is necessary to determine whether the verification criteria outlined in the plan have been met. If no, then the system element, i.e. the algorithm and corresponding configuration needs to be redesigned and re-verified. If yes, next step in the maintenance credit process is generation of production unit. It is to be noted that at this point the Air-Worthiness Report (AWR) has not yet been written for the credit methodology. The next step in the process is validation of the credit methodology. It needs to be determined whether the validation criteria outlined in the V&V plan have been met. If no, then the system element, i.e. the algorithm and corresponding configuration needs to be redesigned, re-verified and re-validated. If the validation was successful, then an AWR for the methodology can be written and the unit can be officially introduced into production. Once the system has been validated, a controlled introduction to service should be conducted, since there may still be some elements that can't be fully validated in the development phase. In this phase, data is collected from use in the actual aircraft, this data is then used to calibrate sensors and to tune and train the detection and prognosis algorithms. This basically means treating the maintenance credit as a maintenance benefit, only providing advisory activities for the time being. As soon as this phase has been completed, a full introduction to service can be performed (FAA’s advisory circular AC 29-2C MG-15).
2.4.3. Instruction for Continued Airworthiness
The final part of the certification process mainly focuses on training, documentation and operations of the CBM system. A plan is needed to ensure continued airworthiness of those parts that could change with time or usage and includes the methods used to ensure continued airworthiness. The applicant for ISHM is required to provide ICA developed in accordance with FAR/JAR Part 29 and Appendix A. This section provides supplemental guidance
First European Conference of the Prognostics and Health Management Society, 2012
175
European Conference of Prognostics and Health Management Society 2012
5
with addressing aspects unique to HUMS (FAA’s advisory Circular AC 29-2C MG-15). Regulatory requirements for the “Instructions for Continued Airworthiness”, which must be written in English as a manual, contains: system description, installation, operation information, servicing information, system maintenance instructions including troubleshooting, methods of removal/replacement, access diagrams, etc.
2.5. V&V Roadmap
The Figure 3 depicts the V&V road map of ISHM with increasing Technology Readiness Level. On the basis of earlier discussion, V&V process towards airworthiness certification of ISHM will be spread over the following phases:
• Concept Refinement & Technology Development
• Development • Controlled Introduction to Service • Instruction for Continuous Airworthiness
V&V platforms or methods, which are mentioned in the second row of the figure corresponding to each phase, are summarized here.
• Concept Refinement & Technology Development o RCM Tools o Component Simulation o Component RIG o Formal Method for Analysis o Integrated Simulation Framework o Integrated Simulation Framework driven by
offline Flight Data o Integration Rig extended from Simulation
Framework o Hardwire-in-Loop Simulation
• Developmentt
o Ground System Deployment o Non-critical Flight System Deployment
• Controlled Introduction to Service
o Maturation of ISHM o Critical Flight System Deployment
• Instruction for Continuous Airworthiness
• In Service Validation – continued airworthiness
Note: On the basis of cost and impact analysis, applicability of formal method & HILS are decided.
Figure 3. V&V Roadmap with increasing TRL
First European Conference of the Prognostics and Health Management Society, 2012
176
3. ISHM SIMULATION FRAMEWORK IN V&V PROCESS
Cassidian develops a comprehensive integrated PC based simulation framework for integrated system health monitoring and management research and development. This ISHM framework is used primarily for demonstrating Proof of Enablers (PoE) and System Integration Laboratory (SIL) testing which is goal of concept refinement and technology development. User objective and metrics related to ISHM can be refined through Exhaustive Monte Carlo simulation of off-nominal seniors. Ground based ISHM systems can be deployed in this environment. This framework with high fidelity modelling of sub-systems and sensor data provides enough confidence in installation of on-board ISHM non-critical systems before controlled introduction to service for further tuning & refinement of algorithm. The integrated Simulation Framework is extendable enough to include offline stored flight data. In case of similar types of sub-systems already being flown in different aircraft, recorded sensor data could be made useful for more realistic validation of algorithm. Aircraft System models within the Simulation Framework are able to load, store off-line flight data and generate sensor data specific to sub-systems. In this mode, computation of physics based models is made disabled. Integrated HILS will have simulation of Aircraft Dynamics, Aircraft Subsystem H/W and adverse environmental effects. Also, there is the capability to inject system faults. This facility can expedite the validation process of ISHM and reduce validation time period during Controlled Introduction to Service. However this capability demands a huge investment of time and capital. These investments can be greatly reduced in case of V&V of aircraft’s ISHM by utilization of Simulation Framework. Integrated Simulation Framework can be integrated to individual test bed like SHM test rig. The conclusive evidence would be structural fault detection capabilities observed during the operation of the aircraft. The occurrences of structural faults such as cracks are infrequent, and hence, years of flight tests might be required to collect validation evidence; small number of flights would be only sufficient to prove the system “fitness for flight” and would be insufficient to prove “fitness for purpose”. Therefore, a validation approach would be required to extrapolate from laboratory tests to actual aircraft. Reference (HAHN Spring Limited. (2011)) has suggested that a generalisation and calibration approach would be required to extrapolate from laboratory specimens to actual aircraft; such an approach is expected to vary
between the different tasks and technologies of SHM systems. From the V&V roadmap, it is very much evident that different facilities are needed towards V&V, certification & qualification of ISHM technologies. Cassidian’s ISHM simulation Framework plays multi-role being as a single platform.
3.1. ISHM Simulation Framework
The goal of ISHM system are preparation of intelligent Maintenance Plan, intelligence Mission Plan and automatic logistic function for enhancing availability, maintainability and mission capabilities. These functions are achieved through Condition Based Maintenance (CBM). The Simulation Framework, which is built around OSA-CBM and OSA-EAI architecture, simulates all ISHM functional layers through different sub-system models
Prognostic Health Management (PHM) is the core of ISHM technology. Like in any other domain, challenges in the introduction of PHM systems in the aerospace domain are twofold. On the one hand, there are individual challenges in developing sensor technology, state detection and health assessment methodologies and models for determining the future life span of a (possibly deteriorated) component. On the other hand, there are integration challenges when turning heterogeneous data from disparate and distributed sources into consolidated information and dependable decision support on aircraft and fleet level. It has therefore been recognized in the community that standardized and open data management solutions are crucial to the success of PHM. Such a standard should introduce a commonly accepted framework for data representation, data communication and data storage. Key findings through the development of Cassidian’s ISHM Simulation Framework are:
• ISHM Simulation Framework plays vitals role in V&V process for ISHM.
• State-of-the-practice in using open architecture standards like OSA-CBM, OSA-EAI are not sufficient. This may require customisation or improvement in standards. These include standardizing non-XML-based transportation formats for OSA-CBM data packets for real-time operating condition, optimization of OSA-EAI database model for analytical tasks, etc.
• This provides a comprehensive RCM based CBM ground-base framework to realise and validate the full benefit of ISHM.
First European Conference of the Prognostics and Health Management Society, 2012
177
European Conference of Prognostics and Health Management Society 2012
7
Figure 4. ISHM Simulation Framework
ISHM Simulation Framework simulates following modules:
• Aircraft System Model
• On-board ISHM System
• On-ground ISHM System
• Supply Chain (Enterprise Level)
• Simulation Management
Simulation of Aircraft system model and supply chain (Enterprise Level) create simulation environment for ISHM system models and simulation management controls the operation of complete ISHM Simulation Framework.
3.1.1. Aircraft System Model
Aircraft System Model simulates those systems and their sensors for which we intend to develop ISHM capabilities. Aircraft System Model have high fidelity modeling of Aircraft aerodynamics model, Hydraulics / Actuator System Model, Landing Gear, Fuel, ECS and Aircraft Structure, etc. Each sub-system implements physics based modeling of dynamic behavior, physics of fault, and computation of states or parameters for deriving senor data for each sub-system. Sensor data for each sub-system are generated from
computed states and parameters after corrupting with all possible errors that might occur in real-life scenario, as well as with noise specific to those sensors. All faults are injected from simulation control GUI. Any system for which ISHM specific monitoring and prediction capabilities should be validated and verified, needs to be modelled with a high level of detail. This should enable the realistic simulation of failures to support the validation of diagnostic and prognostic functions. Respective controller model simulates Built-in-Test (BIT) and Reactive Health Assessment (RHA) of the sub-system.
3.1.2. On-board ISHM
On-board ISHM function includes a central ISHM data processor. Sensors push their data to the IVHM data processor via an OSA-CBM implementation. The underlying message protocol is optimized for embedded systems. The ISHM data processor calculates ISHM information according to the OSA-CBM layer specifications, up to health assessment layer. As per OSA-CBM, there are seven functional layers. Central ISHM data processor has following functions:
First European Conference of the Prognostics and Health Management Society, 2012
178
European Conference of Prognostics and Health Management Society 2012
8
Figure 5. Fault simulation concept for Simulation Framework
• First four functions of OSA-CBM o Data Acquisition o Data Manipulation o State Detection o Health Assessment
• High Level Reasoning • BIT Function • Storing of on-board health data
Several seeded fault tests under fixed conditions are sufficient to enable the model-based development of diagnostic functions. The development of prognostic functions (to be part of ground based ISHM) needs also to cover the development of suitable failure mode specific degradation models. Once the degradation models have been developed, it is possible to verify the diagnostic and prognostic functions through Monte-Carlo simulations. These simulations should include stochastic fault insertion for so-called "hard faults" (stochastically occurring failures without impacts on observable system parameters before the specified failure threshold is exceeded) and the usage of degradation models for "soft faults" (stochastically occurring degradations with impacts on observable system parameters before the specified failure threshold is exceeded). This concept is illustrated in Figure 5.
3.1.3. Ground based ISHM
Major functionalities towards enhancing availability, maintainability and mission capabilities related to ISHM
system are realized by ground base sub-systems. On-board ISHM function includes only data acquisition and diagnostic function of equipment health along with intermediate processing of data. Ground base ISHM system has significant amount of processing related to the following prime functions:
• On Ground Heath Management function • Operational Risk Assessment / Fleet High Level
Reasoning • Maintenance Management • Maintenance Planer • Resource / Logistic Management • Mission Planer • Learning Agent • Simulation of Enterprise System • Presentation Layer
Ground-base ISHM functionalities are enhanced from the core concept provided by Fatih Camci et al. (2006). On Ground Health Management function: On ground health management function consists of advance diagnostic, advance diagnostic and predictive analysis. Advance diagnostic validates further on-board diagnostic result with historical data of same aircraft and fleet wide fault data base and refine diagnostic decision. Advance prognostic computes RUL & Confidence for CBM candidate. Predictive Analysis (Trend analysis) identifies impending failure using trend analysis of historically collected data, but does not predict when failure will occur.
First European Conference of the Prognostics and Health Management Society, 2012
179
European Conference of Prognostics and Health Management Society 2012
9
Maintenance Management: Maintenance Management functions finds one of the following maintenance solutions for a sub-system depending upon RCM process:
• Run-to-Fail
• Reactive
• Preventive (calendar based)
• Predictive
• CBM
Maintenance Management executes the following functions: • Identification of Maintenance task corresponding
to sub-system / functional failure • Rank of optimal maintenance task is computed as a
function of maintenance effectiveness for the failure mode, maintenance downtime and cost.
• Execute Maintenance (work order generation, Track Maintenance action, Receive feedback and close work order) as per approved maintenance plan
Maintenance Planer: Opportunistic Maintenance agent finds opportunistic maintenance time and task using rank of maintenance task, Mission capability of sub-system / function for future mission, RUL for future mission. Maintenance planner schedules the intelligent maintenance plan, validates with Resource Management Feedback and publishes maintenance plan after getting approval from decision support system. Resource / Logistic Management: This function tracks the availability along with configuration parameters of LRUs, tools, parts, consumables and personnel, etc. (configurable items). On the receipt of maintenance plan, Resource / Logistic management function sends feedback on validity of maintenance plan to Maintenance Planner on the basis of resource availability. This function finally generates a plan for resource / inventory and generates order for parts or LRUs to OEMs or suppliers as per present and projected status of inventory. Mission Planer: Mission Plans & Flying Programmes are entered using digital map and editing GUI. Mission planner instructs user to reschedule the Mission Plan if performance of aircraft exceeds as per mission plan entered and edited. Flying programs are asked to reschedule if approved maintenance plan superimposes with mission plan. Applicability of
mission segments of a particular aircraft is checked further with respect to operational capabilities of the aircraft for the segment, computed by Operational Risk Assessment (ORA). If capability of flight segment or complete mission is less than critical threshold, Mission Planner instructs user to reschedule or cancel the mission for particular Aircraft. Learning Agent: As experience is accumulated, some of the parameters within the model can be learned automatically by analyzing the feedback from the maintainer, OEM industry, Mission Commander, Resource Manager. The parameters to be learned are opportunistic maintenance threshold, required maintenance threshold, resource lead time, maintenance effectiveness and different co-efficient related to diagnostics & prognostics, etc
Simulation of Enterprise System: This module simulates supply of specific LRUs or parts from OEM, Service/Industry Support organization, Wholesale Stock point accounting appropriate accumulated delay attributed due to order process by resource management function, manufacturing (if applicable), shipping process, etc related to Supply Chain Management.
Presentation Layer: Decision support personal interacts through Presentation Layer which consists of following GUIs distributed across different terminals.
• Health Management & Monitoring
• Interactive GUI for Maintenance Management
• Resource Management & Monitoring
• Maintenance Planner
• Mission Planner
High Level Reasoning / Operational Risk Assessment: High Level Reasoning (HLR) is the capability that can estimate an airplane’s (or vehicle’s) functional availability. The purpose of HLR concept is used to estimate the functional availability of a vehicle based on the health assessment results from lower level systems and subsystems. Both concepts are part of the HLR development and integration into the simulation framework. RUL & confidence is recomputed for each component failure for all future missions and used by HLR. ORA finally determines and quantifies remaining functional / operational availability at the subsystem, vehicle levels and mission levels.
First European Conference of the Prognostics and Health Management Society, 2012
180
European Conference of Prognostics and Health Management Society 2012
10
4. CONCLUSIONS
From above discussion, it is evident that nature of challenges in V&V and certification of ISHM is different compared to standard stand alone system. One of the major challenges in certification of ISHM system is due to non-availability of comprehensive regulatory standards for ISHM. V&V also poses challenges mainly due to the fact that ISHM has to handle a large number of off-nominal scenarios, has to ensure performance, safety, and reliability across the entire performance envelope and has to reliably avoid ‘false alarm’. Moreover, V&V has to deal with multidisciplinary aspects of ISHM. Most prominent aspect is direct evidence gathering for faults effects related to V&V of diagnostics and much more difficult for prognostics. To handle these issues, the key aspects of ISHM V&V mentioned above are summarized here: • V&V maturity starts from concept refinement and
technology development phase.
• If specific sub-system / function of ISHM, is classified as Hazardous/Severe Major, then direct evidence must be gathered. (FAA’s advisory circular AC 29-2C MG-15).
• If specific sub-system / function of ISHM, is classified
as Major or Lower, then indirect evidence is sufficient. (FAA’s advisory circular AC 29-2C MG-15).
• During ‘Controlled Introduction to Service’, CBM
maintenance credit is considered as maintenance benefit. i.e. CBM output is compared with maintenance instructions suggested by conventional RCM process.
• After maturation of algorithm and certification, CBM obtains maintenance credit.
• Appropriate sequence of V&V process of ISHM
function layers are to be considered.
• It must be noted that the V&V of ISHM functionalities in Simulation Framework do not completely address defects created by designer. It is evident from Figure 3 (V&V Roadmap with increasing TRL) that subsequent V&V phases (i.e. V&V in integration RIG, Integrated HILS, V&V during controlled introduction to the service and ICA) are suggested in order to achieve maintenance credit.
• Since ISHM simulation framework plays vital role in
V&V process, simulation framework has to be qualified (Robert G. Sargent. 1998).
The survey of works towards ISHM certification, suggested customization and experience in using simulation
framework for V&V provide impression that certification of ISHM is not impossible although it is not easy job. This study may give enough confidence to ISHM community towards achieving maintenance credit through implementation of this technology.
NOMENCLATURE
AC Advisory Circular AMC Acceptable Means of Compliance ARP Aerospace Recommended Practice AWR Airworthiness Report BIT Build-In Test CBM Condition Based Maintenance CC Certification Coordinator CS Certification Specification EAI Enterprise Application Integration FHA Functional Hazard Analysis FMECA Failure Modes, Effects, and Criticality Analysis GUI Graphical User Interface HILS Hardware in Loop Simulation HLR High Level Reasoning HUMS Heath Usage Monitoring System IA Integrity Assessment ICA Instruction for Continued Airworthiness ISHM Integrated System Health Monitoring IVHM Integrated Vehicle Heath Monitoring LRU Line Replaceable Unit OEM Original Equipment Manufacturer ORA Operational Risk Assessment OSA Open System Architecture PHM Prognostic Health Management RCM Reliability Centered Maintenance RUL Remaining Useful Life SHM Structural Health Monitoring TRL Technology Readiness Level TSO Technical Standard order
REFERENCES
A. Hess, G. Calvello, and T. Dabney. (2004). PHM a key enabler for the JSF autonomic logistics support concept. IEEE Aerospace Conference.
Brian D Larder, Mark W Davis. (2007). HUMS Condition Based Maintenance Credit Validation. American Helicopter Society 63rd Annual Forum, Virginia Beach, VA.
Dimitry Gorinevsky, Azary Smotrich, Robert Mah, Ashok Srivastava, Kirby Keller & Tim Felke. (2010). Open Architecture for Integrated Vehicle Health Management. AIAA Infotech @ Aerospace, Atlanta, GA.
First European Conference of the Prognostics and Health Management Society, 2012
181
European Conference of Prognostics and Health Management Society 2012
11
FAA Advisory Circular 29-2C MG 15, Airworthiness Standards Transport Category Rotorcraft.
Fatih Camci, G. Scott Valentine, Kelly Navarra. (2006). Methodologies for Integration of PHM Systems with Maintenance Data. IEEEAC paper #1191, Version 1.
Gorinevsky, D., Smotrich, A., Mah, R., Srivastava A., Keller, K., & Felke, T. (2010). Open Architecture for Integrated Vehicle Health Management. AAIA Infotech@Aerospace Conference.
HAHN Spring Limited. (2011). Development, Validation, Qualification and Certification of Structural Health Monitoring Systems. HAHN Spring Report 1/B002.
James E. Dzakowic, G. Scott Valentine. (2004). Advanced Techniques for the verification and validation of prognostics & health management capabilities. Impact Technologies, LLC.
Kevin R. Wheeler, Tolga Kurtoglu, and Scott D. Poll. A (2010). Survey of Health Management User Objectives Related to Diagnostic and Prognostic Metrics.
Martin S. Feather, Lawrence Z. Markosian. (2005). Emerging Technologies for V&V of ISHM Software for Space Exploration. IEEE Aerospace Conference paper #1441, V-2.
O Benedettini, T S Baines, HWLightfoot, and RMGreenough,(2008). State-of-the-art in integrated vehicle health management.
Praneet Menon, Bob Robinson, Mike August, Terry Larchuk & Jack Zhao. (2011). Verification and Validation Process for CBM Maintenance Credits. American Helicopter Society 67th Annual Forum.
Robert G. Sargent. (1998). Verification and Validation of Simulation Model. Simulation Research Group, Syracuse University
SAE International, (2010), Aerospace Standards News Letter, Volume II, Issue 1.
US Army, ADS-79B-HDBK, (2011). Aeronautical Design Standard Handbook for Condition Based Management for US Army Aircraft.
BIOGRAPHIES
Matthias Buderath - Aeronautical Engineer with more than 25 years of experience in structural design, system engineering and product- and service support. Main expertise and competence is related to system integrity management, service solution architecture and integrated system health monitoring and management, Today he is head of technology development in CASSIDIAN. He is member of international Working Groups covering Through Life Cycle Management, Integrated System Health Management and Structural Health Management. He has published more the 50 papers in the field of Structural Health Management, Integrated Health Monitoring and Management, Structural Integrity Programme Management and Maintenance and Fleet Information Management Systems. Partha Pratim Adhikari - has more than thirteen years of experience in the field of Avionics and Aerospace Systems. Partha worked with RCI, DRDO; Aeronautical Development Agency (Ministry of Defence) and CAE Simulation Technology before joining Cassidian, CAIE, Banglore where he currently leads the Integrated System Health Monitoring (ISHM) program from Bangalore center. Partha has Bachelor’s degrees in Physics (H) and B. Tech in Opto-electronics from Calcutta University and a Master’s degree in Computer Science from Bengal Engineering and Science University. In his tenure across various aerospace organizations, Partha made significant contributions in the fields of Navigation systems, Avionics and Simulation technologies. Partha published several papers in the fields of estimation, signal processing and simulation of flight systems in national as well as international conferences and journals. Partha, in his current role as Tech Lead, Avionics at Cassidian, CAIE, Banglore is working on devising ISHM technologies for aviation systems with focus on complete vehicle health, robust implementation and certification of the developed technologies.
First European Conference of the Prognostics and Health Management Society, 2012
182
Theoretical and Experimental Evaluation of a Real-Time CorrosionMonitoring System for Measuring Pitting in Aircraft Structures
Douglas Brown1, Duane Darr2, Jefferey Morse3, and Bernard Laskowski4
1,2,3,4 Analatom, Inc., 562 E. Weddell Dr. Suite 4, Sunnyvale, CA 94089-2108, [email protected]@[email protected]
ABSTRACT
This paper presents the theory and experimental validation ofAnalatom’s Structural Health Management (SHM) system formonitoring corrosion. Corrosion measurements are acquiredusing a micro-sized Linear Polarization Resistance (µLPR)sensor. The µLPR sensor is based on conventional macro-sized Linear Polarization Resistance (LPR) sensors with theadditional benefit of a reduced form factor making it a viableand economical candidate for remote corrosion monitoring ofhigh value structures, such as buildings, bridges, or aircraft.
A series of experiments were conducted to evaluate the µLPRsensor for AA 7075-T6. Test coupons were placed along-side Analatom’s µLPR sensors in a series of accelerated tests.LPR measurements were sampled at a rate of once per minuteand converted to a corrosion rate using the algorithms pre-sented in this paper. At the end of the experiment, pit-depth due to corrosion was computed for each sensor fromthe recorded LPR measurements and compared to the aver-age pit-depth measured on the control coupons. The resultsdemonstrate the effectiveness of the sensor as an efficient andpractical approach to measuring pit-depth for AA 7075-T6.
1. INTRODUCTION
Recent studies have exposed the generally poor state of ournation’s critical infrastructure systems that has resulted fromwear and tear under excessive operational loads and environ-mental conditions. SHM (Structural Health Monitoring) Sys-tems aim at reducing the cost of maintaining high value struc-tures by moving from SBM (Scheduled Based Maintenance)to CBM (Condition Based Maintenance) schemes (Huston,2010). These systems must be low-cost, simple to installwith a user interface designed to be easy to operate. To re-
Douglas Brown et.al. This is an open-access article distributed under theterms of the Creative Commons Attribution 3.0 United States License, whichpermits unrestricted use, distribution, and reproduction in any medium, pro-vided the original author and source are credited.
Figure 1. Analatom AN101 SHM system installed in the rearfuel-bay bulkhead of a commercial aircraft.
duce the cost and complexity of such a system a generic in-terface node that uses low-powered wireless communicationshas been developed by Analatom. This node can communi-cate with a myriad of common sensors used in SHM. In thismanner a structure such as a bridge, aircraft or ship can befitted with sensors in any desired or designated location andformat without the need for communications and power linesthat are inherently expensive and complex to route. Data fromthese nodes is transmitted to a central communications Per-sonal Computer (PC) for data analysis. An example of this isprovided in Figure 1 showing Analatom’s AN101 SHM sys-tem installed in the rear fuel-bay bulkhead of a commercialaircraft.
A variety of methods such as electrical resistance,gravimetric-based mass loss, quartz crystal micro-balance-based mass loss, electrochemical, and solution analysis meth-ods enable the determination of corrosion rates of metals.The focus of this paper is on, Linear Polarization Resistance(LPR), a method based on electrochemical concepts to de-termine instantaneous interfacial reaction rates such as corro-sion rates and exchange current densities from a single exper-iment. There are a variety of methods capable of experimen-
1
First European Conference of the Prognostics and Health Management Society, 2012
183
European Conference of the Prognostics and Health Management Society, 2012
tally determining instantaneous polarization resistances suchas potential step or sweep, current step or sweep, impedancespectroscopy, as well as statistical and spectral noise methods(Scully, 2000). The remainder of this paper will focus on theformer as the AN101 SHM system uses the potential step (orsweep) approach to measure LPR.
The remainder of the paper is organized by the following.Section 2 describes the general theory governing LPR. Sec-tion 3 presents Analatom’s µLPR discussing the benefitsof miniaturizing the sensor from a macro-scaled LPR. Sec-tion 4 outlines the experimental setup and procedure usedto validate the µLPR sensor. Section 5 presents the experi-mental measurements with the accompanying analysis whichdemonstrates the effectiveness of the µLPR sensor. Finally,the paper is concluded in Section 6 with a summary of thefindings and future work.
2. LPR THEORY
The corrosion of metals takes place when the metal dissolvesdue to oxidation and reduction (electrochemical) reactionsat the interface of metal and the (aqueous) electrolyte so-lution. Atmospheric water vapor is an example of an elec-trolyte that corrodes exposed metal surface and wet concreteis another example of an electrolyte that can cause corrosionof reinforced rods in bridges. Corrosion usually proceedsthrough a combination of electrochemical reactions; (1) an-odic (oxidation) reactions involving dissolution of metals inthe electrolyte and release of electrons, and (2) cathodic (re-duction) reactions involving gain of electrons by the elec-trolyte species like atmospheric oxygen O2, moisture H2O,or H+ ions in an acid (Bockris, Reddy, & Gambola-Aldeco,2000). The flow of electrons from the anodic reaction sites tothe cathodic reaction sites constitutes corrosion current andis used to estimate the corrosion rate. When the two reac-tions are in equilibrium at the equilibrium corrosion poten-tial, Ecorr, the net current on the metal surface is zero with-out an external source of current. The anodic reactions pro-ceed more rapidly at more positive potentials and the cathodicreactions proceed more rapidly at more negative potentials.Since the corrosion current from the unstable anodic and ca-thodic sites is too small to measure, an external activationpotential is applied across the metal surface and the currentis measured for electrochemical calculations. The resultingEa vs. Ia curve is called the polarization curve. Under exter-nal activation potential, the anodic and cathodic currents in-crease exponentially and so when log10 Ia is plotted againstEa (a Tafel Plot), the linear regions on the anodic and ca-thodic curves correspond to regions where either the anodicor cathodic reactions dominate and represent the rate of theelectrochemical process. The extrapolation of the Tafel linearregions to the corrosion potential gives the corrosion current,Icorr, which is then used to calculate the rate of corrosion(Burstein, 2005).
2.1. Anodic and Cathodic Reactions
Electrochemical technique of Linear Polarization Resistance(LPR) is used to study corrosion processes since the corrosionreactions are electrochemical reactions occurring on the metalsurface. Modern corrosion studies are based on the conceptof mixed potential theory postulated by Wagner and Traud,which states that the net corrosion reaction is the result oftwo or more partial electrochemical reactions that proceed in-dependently of each other (Wagner & Traud, 1938). For thecase of metallic corrosion in presence of an aqueous medium,the corrosion process can be written as,
M + zH2Of←→b
Mz+ +z
2H2 + zOH−, (1)
where z is the number of electrons lost per atom of the metal.This reaction is the result of an anodic (oxidation) reaction,
Mf←→b
Mz+ + ze−, (2)
and a cathodic (reduction) reaction,
zH2O + ze−f←→b
z
2H2 + zOH−. (3)
It is assumed that the anodic and cathodic reactions occurat a number of sites on a metal surface and that these siteschange in a dynamic statistical distribution with respect tolocation and time. Thus, during corrosion of a metal surface,metal ions are formed at anodic sites with the loss of electronsand these electrons are then consumed by water molecules toform hydrogen molecules. The interaction between the an-odic and cathodic sites as described on the basis of mixedpotential theory is represented by well-known relationshipsusing current (reaction rate) and potential (driving force). Forthe above pair of electrochemical reactions (anodic (2) andcathodic (3)), the relationship between the applied current Iaand potential Ea follows the Butler-Volmer equation,
Ia = Icorr
exp
[2.303 (Ea − Ecorr)
βa
]− . . .
exp
[−2.303 (Ea − Ecorr)
βc
], (4)
where βa and βc are the anodic and cathodic Tafel parametersgiven by the slopes of the polarization curves ∂Ea/∂ log10 Iain the anodic and cathodic Tafel regimes, respectively andEcorr is the corrosion potential (Bockris et al., 2000).
2.2. Electrode Configuration
An electrode is a (semi-)conductive solid that interfaces withan electrolytic solution. The most common electrode con-figuration is the three-electrode configuration. The commondesignations are: working, reference and counter electrodes.The working electrode is the designation for the electrode be-ing studied. In corrosion experiments, this is the material that
2
First European Conference of the Prognostics and Health Management Society, 2012
184
European Conference of the Prognostics and Health Management Society, 2012
is corroding. The counter electrode is the electrode that com-pletes the current path. All electrochemistry experiments con-tain a working–counter pair. In most experiments the counterelectrode is simply the current source/sink comprised of in-ert materials like graphite or platinum. Finally, the referenceelectrode serves as an experimental reference point, specifi-cally for potential (sense) measurements. The reference elec-trode is positioned so that it measures a point very close tothe working electrode.
The three-electrode setup has a distinct experimental advan-tage over a two electrode setup: only one half of the cell ismeasured. That is, potential changes of the working elec-trode are measured independently of changes that may occurat the counter electrode. This configuration also reduces theeffect of measuring potential drops across the solution resis-tance when measuring between the working and counter elec-trodes.
2.3. Polarization Resistance
The corrosion current, Icorr, cannot be measured directly.However, a-priori knowledge of βa and βc along with a smallsignal analysis technique, known as polarization resistance,can be used to indirectly compute Icorr. The polarization re-sistance technique, also referred to as “linear polarization”,is an experimental electrochemical technique that estimatesthe small signal changes in Ia when Ea is perturbed byEcorr±10 mV (G102, 1994). The slope of the resulting curveover this range is the polarization resistance,
Rp , ∂Ea
∂Ia
∣∣∣∣|Ea−Ecorr|≤10 mV
. (5)
Note, the applied current, Ia, is the total applied current andis not multiplied by the electrode area so Rp as defined in (5)has units of Ω. Provided that |Ea − Ecorr| /βa ≤ 0.1 and|Ea − Ecorr| /βc ≤ 0.1, the first order Taylor series expan-sion exp (x) u 1 + x can be applied to (4) and (5) to arriveat,
Rp =1
2.303Icorr
(βaβcβa + βc
). (6)
Finally, this expression can be re-written for Icorr to arrive atthe Stern-Geary equation,
Icorr =B
Rp, (7)
where B = 12.303 [βaβc/ (βa + βc)] is a constant of propor-
tionality.
2.4. Pit Depth
The pit depth due to corrosion is calculated by computing thepitting current density, ipit,
ipit (t) =icorr − ipvNpit
, (8)
where icorr = Icorr/Asen is the corrosion current density,ipv is the passive current density, Npit is the pit density forthe alloy (derived empirically) and Asen is the effective sur-face area of the LPR sensor. One critical assumption is thepH is in the range of 6-8. If this cannot be assumed, then ameasurement of pH is required and ipassive is needed overthe range of pH values. Next, Faraday’s law is used to re-late the total pitting charge with respect to molar mass loss.Let the equivalent weight (EW ) represent the weight of themetal that reacts with 1 C of charge, thus contributing to thecorrosion and overall loss of material in the anodic (oxida-tion) reaction given in (2). The total pitting charge, Qcorr,and molar mass loss, M , can be related to the following,
Qpit (t) = zF ·M (t) , (9)where F = 9.650 × 104 C/mol is Faraday’s constant, and zis the number of electrons lost per atom in the metal in thereduction-oxidation reaction. The EW is calculated from theknown Atomic Weight (AW ) of the metal,
EW =AW
z. (10)
Next, the number of moles of the metal reacting can be con-verted to an equivalent mass loss, mloss,
mloss (t) = M (t) ·AW. (11)Combining (9) through (11), the mass loss mloss is related toQpit by,
mloss (t) =EW ·Qpit (t)
F. (12)
With the mass loss calculated and knowing the density ρ,the pit-depth modeled using a semi-spherical volume with adepth (or radius) d is expressed as,
d (t) =
(3mloss (t)
2πρ
)1/3
. (13)
Now, note that Qpit can be found by integrating ipit over thetotal time,
Qpit (t) =
ˆ t
0
ipit (τ) dτ, (14)
Substituting (12) and (14) into (13) gives,
d (t) =3
√3EW
2πρF
ˆ t
0
ipit (τ) dτ. (15)
Next, by substituting (7) and (8) into (15), the expression ford can be rewritten as,
d (t) = 3
√3EW
2πρNpitF
ˆ t
0
(B
AsenRp (τ)− ipv
)dτ. (16)
In practice, Rp is not measured continuously, rather, periodicmeasurements are taken every Ts seconds. If its assumed overthis interval the Rp values changes linearly then the meanvalue theorem for integrals can be applied to arrive at an al-ternative expression for d,
d (t) =3
√3TsEW
2πρNpitF
N−1
Σk=0
(B
AsenRp (kTs)− ipv
). (17)
3
First European Conference of the Prognostics and Health Management Society, 2012
185
European Conference of the Prognostics and Health Management Society, 2012
2.5. Standard Measurements
Capacitive current can result in hysteresis in small amplitudecyclic voltammogram Ea vs. Ia plots. High capacitance,multiplied by a rapid voltage scan rate, causes a high capaci-tive current that results in hysteresis in cyclic Ea vs. Ia data(Scully, 2000). This affect can be reduced by making mea-surements at a slow scan rate. The maximum scan rate al-lowed to obtain accurate measurements has been addressedby Mansfield and Kendig (Mansfeld & Kendig, 1981). Themaximum applied frequency allowed to obtain the solutionresistance, Rs, and the polarization resistance, Rp, from aBode plot can be approximated by,
fmax < fbp u1
2πC (Rp +Rs), (18)
where fbp is an approximation of the lower break-point fre-quency, fmax is the maximum test frequency and C is thecapacitance that arises whenever an electrochemical interfaceexists between the electronic and ionic phases.
2.5.1. Linear Polarization Resistance (LPR)
ASTM standards D2776 and G59 describe standard proce-dures for conducting polarization resistance measurements.Potentiodynamic, potential step, and current-step methodscan be used to compute Rp (D2776, 1994; G59, 1994). Thepotentiodynamic sweep method is the most common methodfor acquiring Rp. For conventional macro-LPR measure-ments, a potentiodynamic sweep is conducted by applyingEa between Ecorr ± 10 mV at a slow scan rate, typically0.125 mV/s. A linear fit of the resulting Ea vs. Ia curve isused to computeRp. Performing this operation takes 160 sec-onds to complete.
3. µLPR CORROSION SENSOR
In this section, a micro-LPR (µLPR) is presented which usesthe potential step-sweep method to compute polarization re-sistance. The µLPR works on the same principle as themacro-sized LPR sensors and is designed to corrode at thesame rate as the structure on which it is placed. AlthoughLPR theory is well established and accepted as a viable cor-rosion monitoring technique, conventional macro-sized LPRsensor systems are expensive and highly intrusive. The µLPRis a micro-scaled LPR sensor inspired from the macro-sizedversion discussed in the previous section. Scaling the LPRsensor into a micro-sized package provides several advan-tages which include,
• Miniature form factor
• Two-pair electrode configuration
• Faster LPR measurements
(a)
(b)
Figure 2. Thin film µLPR sensor (a) exposed and (b) quasi-exposed with the lower-half underneath a coating.
3.1. Form Factor
Expertise in semiconductor manufacturing is used to micro-machine the µLPR. Using photolithography it is possible tomanufacture the µLPR sensor from a variety of standard engi-neering construction materials varying from steels for build-ings and bridges through to novel alloys for airframes. Themicro sensor is made up of two micro machined electrodesthat are interdigitated at 150µm spacing. The µLPR sensoris made from shim stock of the source/sample material thatis pressure and thermally bonded to Kapton tape. The shimis prepared using photolithographic techniques and ElectroChemical Etching (ECM). It is further machined on the Kap-ton to produce a highly ductile and mechanically robust microsensor that is very sensitive to corrosion. Images of the µLPRshown bare and a fitted sensor underneath a coating are shownin Figure 2.
3.2. Electrode Configuration
The µLPR differs from the conventional macro-sized LPRsensors in two major ways. First, the µLPR only consists oftwo electrodes. The need for the reference electrode is elim-inated as the separation distance between the working andcounter electrodes, typically 150µm, minimizes any voltagedrop due to the solution resistance, Rs. Second, both elec-trodes are composed of the same working metal. This isuncommon in most electrochemical cells where the counterelectrode is made of an inert material. The benefit is the elec-trodes provide a more direct measurement of corrosion thantechniques which use electrodes made of different metals (eg.gold). The sensor consists of multiple plates made from thematerial of interest which form the two electrodes. The elec-
4
First European Conference of the Prognostics and Health Management Society, 2012
186
European Conference of the Prognostics and Health Management Society, 2012
trodes are used in conjunction with a potentiostat for conduct-ing LPR measurements. The use of a relatively large counterelectrode minimizes polarization effects at the counter elec-trode to ensure that a stable reference potential is maintainedthroughout the experiments.
3.3. LPR Measurements
Potential step-sweeps are performed by applying a series of30 steps over a range of ±10 mV spanning a period of 2.6 s.This allows eight µLPR sensors to be measured in less than30 s. However, the effective scan-rate of 7.7 mV/s generatesan additional current, Idl, due to rapid charging and discharg-ing of the capacitance, referred to as the double-layer capaci-tance Cdl, at the electrode-electrolyte interface,
Idl = CdldEa
dt. (19)
Let the resulting polarization resistance that is computedwhen Idl is non-zero be represented by Rp. It can be shownthat Rp is related to Rp by the following,
R−1p = R−1
p + Ydl, (20)
such that Ydl is defined by the admittance,
Ydl =
(Cdl
20 mV
)dEa
dt(21)
where dEa/dt is the scan rate. An example of this relation-ship is provided in Figure 3. In this example Cdl/20 mVandR−1
p correspond to the slope and y-intercept; these valueswere computed as 5.466×10−8 Ω−1·s/mV and 3.624×10−6 Ω,respectively. For a scan rate of dEa/dt = 7.7 mV/s, Ydl iscomputed as 4.209×10−7 Ω−1. Finally, for a given solution,Rp can be compensated by,
Rp =Rp
1− YdlRp
for YdlRp < 1. (22)
A plot of the actual LPR, Rp, vs. the measured LPR, Rp, fora µLPR sensor made from AA 7075-T6 with at a scan-rate of7.7 mV/s is provided in Figure 4(a). Note, as Rp decreases,the error between Rp and Rp also decreases, shown in Fig-ure 14(b). This is significant for the following reasons:
• Better accuracy is necessary for smaller values of Rp asthe corrosion rate increases with R−1
p
• When Rp is large, the corrosion rate approaches zero.Therefore, even as the error inRp increases substantially,the error in the corrosion rate becomes negligible.
• The corrosion rate computed using Rp will over-estimatethe actual corrosion rate computed from Rp.
Due to these reasons, and the fact that Analatom’s AN101 hasan upper limit of 5 MΩ for measuring Rp, no compensationis performed when computing corrosion rates.
0 2 4 6 8 103.6
3.7
3.8
3.9
4
4.1
4.2
Scan Rate [mV/s]
R−1
p[Ω
−1]·10−6
bC
bC
bCbC
bCbC
bC
MeasurementLinear Fit
bC
Figure 3. Plot of inverse polarization resistance vs. scan-ratefor a µLPR sensor made from AA 7075-T6 submersed in tapwater.
103 104 105 106103
104
105
106
107
Measured LPR, Rp [Ω]
ActualLPR,R
p[Ω
]
MappingIdeal
(a)
103 104 105 10610−2
10−1
100
101
102
Measured LPR, Rp [Ω]
PercentError
(b)
Figure 4. Plot of the (a) actual LPR, Rp, vs. the measuredLPR, Rp, and (b) corresponding measurement error for aµLPR sensor made from AA 7075-T6 with at a scan-rate of7.7 mV/s.
3.4. Maximum Scan Rate
The maximum measurement speed for conventional macro-sized LPR systems is restricted by the combination of resis-tance (solution / polarization) and capacitance at the elec-trochemical interface. From (18), fmax can be determinedgraphically by estimating fbp from a Bode plot. A bode plotfor the magnitude and phase response of a µLPR sensor con-structed from AA 7075-T6 submersed in distilled water is
5
First European Conference of the Prognostics and Health Management Society, 2012
187
European Conference of the Prognostics and Health Management Society, 2012
10−1 100 101 102 103 104 105 106102
103
104
105
106
Frequency [Hz]
Magnitude[Ω
]
(a)
10−1 100 101 102 103 104 105 106−80
−70
−60
−50
−40
−30
−20
−10
0
Frequency [Hz]
Phase
[deg]
(b)
Figure 5. Bode plot showing the (a) magnitude and (b) phasefor a µLPR sensor constructed from AA 7075-T6 with dis-tilled water as the electrolyte.
shown in Figure 5. The data was generated using a poten-tiostat over the frequency range 0.1 Hz − 1 MHz. The mag-nitude response can be used to measure fbp > 100 Hz. Inpractice, the µLPR sensor applies a scan rate of 7.7 mV/s witha step-size of 0.67 mV between samples. This is equivalentto a sampling rate of 11.5 Hz which is a factor of ten less thanfbp.
4. EXPERIMENT
4.1. Setup
The experiment consisted of twenty-four (24) µLPR sensorsand twelve (12) control coupons. The coupons and µLPRsensors were made from AA 7075-T6. Each coupon wasplaced next to a pair of µLPR sensors. Each sensor washeld in place using a non-reactive polycarbonate clamp witha nylon fitting. All the sensors and coupons were mountedon an acrylic plexiglass base with the embedded hardwareplaced on the opposite side of the frame, shown in Figure 6.An electronic precision balance (Tree HRB-203) with a cali-brated range of 0 − 200 g (±0.001 g) was used to weigh thecoupons before and after the experiment. Finally, a weather-ing chamber (Q-Lab QUV/spray) promoted corrosion on thecoupons and µLPR sensors by applying a controlled streamof tap water for 10 seconds every five minutes.
(a)
(b)
Figure 6. Experimental setup showing (a) all 24 µLPR sen-sors, 12 coupons and three AN101 instrumentation boardsand (b) a close-up view of one of the panels used in the ex-periment. Note: only the first six coupons were used in theanalysis performed in this paper.
4.2. Procedure
First, the surface of each coupon was cleaned using sand-blasting. Then, each coupon was weighed using the analyti-cal balance. The entire panel of coupons and µLPR sensorswere placed in the weathering chamber for accelerated test-ing. The experiment ran for approximately 35 days. Dur-ing the experiment, a set of coupons were periodically re-moved from the weathering chamber. Throughout the ex-periment, Analatom’s embedded hardware was logging Rp
from each µLPR sensor. The sample rate was set at one sam-ple per minute. Once accelerated testing was finished, thecoupons were removed and the LPR data was downloadedand archived for analysis. The corrosion byproducts were re-moved from each coupon by applying micro-bead blastingto the coupon surface. Finally, the cleaned coupons wereweighted using the analytical scale to compute the relativecorrosion depth during the experiment.
5. RESULTS
5.1. Coupon Corrosion
The corrosion byproducts were carefully removed usingmicro-bead blasting. The pitting depth, d, of each coupon
6
First European Conference of the Prognostics and Health Management Society, 2012
188
European Conference of the Prognostics and Health Management Society, 2012
(a)
(b)
Figure 7. Image of the three AA 7075-T6 coupons (ID 2.01,2.03 and 2.04) after approximately 17 days of corrosion test-ing showing (a) the condition of the coupons before cleaningand (b) after cleaning using micro-bead blasting.
was calculated using the formula,
d = 3
√3mloss
2πρNpitAexp, (23)
where values for the mass loss mloss, exposed surface areaAexp, resulting pit depth, d, and total time of exposure of eachcoupon is provided in Table 1. Values for the pitting densityand ρ were set at Npit = 10 cm−2 and ρ = 2.810 g/cm3, re-spectively. The pitting density was computed by counting theaverage number of pits over the surface for coupons 2.06 and2.08. The measurement uncertainty in the pit-depth due touncertainty in the mass loss, ∆mloss and pit density, ∆Npit,is approximately,
∆d ≈ d
3
(∆mloss
mloss
+∆Npit
Npit
), (24)
where ∆mloss = ±0.001 g is the minimum resolution of thescale and ∆Npit = ±3 cm−2 was the standard deviation ofthe measured pit density over 1 cm2 sample areas for coupons2.06 and 2.08.
5.2. µLPR Corrosion
The linear polarization resistance measurements were used tocompute corrosion pit depth for each µLPR sensor. The com-puted pit-depth for each of the 24 µLPR sensors is providedin Figure 8.
6. SUMMARY
A micro-sized LPR (µLPR) sensor was presented for cor-rosion monitoring in Structural Health Management (SHM)applications. An experimental test was performed to com-
0 5 10 15 20 25 30 350
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Time [days]
Pit
dep
th[m
m]
b
bbb
bbbb
Sensor Avg.Measurementb
(a)
0 5 10 15 20 25 30 350
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Time [days]
Averagepit
dep
th[m
m]
b
bbb
bbbb
Sensor Avg.Measurementb
(b)
Figure 8. Comparison of the measured and computed pitdepth over a period of approximately 35 days for (a) eachµLPR sensor and (b) the average of all µLPR sensors.
pare corrosion measurements from twenty-four µLPR sen-sors with twelve coupons. Both the coupons and sensors wereconstructed from the same material, AA 7075-T6. Accord-ing to the results, the pit-depth measured on the coupons fellwithin the 95% confidence interval computed from the pit-depth measured on the µLPR sensors. The results indicatemultiple µLPR can be used to provide an accurate measure-ment of corrosion. Future work includes testing additionalalloys such as AA 7075-T6 and performing in-flight testingon a C-130 legacy aircraft.
ACKNOWLEDGMENT
All funding and development of the sensors and systems inthe project has been part of the US government’s SBIR pro-grams. In particular: 1) In preparing the initial system designand development, funding was provided by the US Air Forceunder SBIR Phase II contract # F33615-01-C-5612 monitoredby Dr. James Mazza, 2) Funding for the development and ex-perimental set-up was provided by the US Navy under SBIRPhase II contract # N68335-06-C-0317 monitored by Dr. PaulKulowitch, and 3) for further improvements and scheduledfield installations by the US Air Force under SBIR Phase IIcontract # FA8501-11-C-0012 monitored by Mr. FeraidoonZahiri.
7
First European Conference of the Prognostics and Health Management Society, 2012
189
European Conference of the Prognostics and Health Management Society, 2012
Table 1. Experimental measurements of coupon corrosion.
Coupon ID Time Exposed [min] Area[cm2
]Initial Mass [g] Final Mass [g] Mass Loss [g] Pit Depth [mm]
Control 0 5.801× 101 7.6870× 101 7.6869× 101 1× 10−3 N/A
2.01 2.1198× 104 5.805× 101 7.7253× 101 7.7215× 101 3.8× 10−2 2.162× 10−1
2.02 1.1160× 104 5.798× 101 7.6842× 101 7.6818× 101 2.4× 10−2 1.855× 10−1
2.03 2.1198× 104 5.799× 101 7.6927× 101 7.6896× 101 3.1× 10−2 2.020× 10−1
2.04 2.1198× 104 5.805× 101 7.6897× 101 7.6869× 101 2.8× 10−2 1.953× 10−1
2.06 3.8510× 104 5.798× 101 7.6884× 101 7.6828× 101 5.6× 10−2 2.461× 10−1
2.08 3.8510× 104 5.803× 101 7.6921× 101 7.6810× 101 5.4× 10−2 2.431× 10−1
NOMENCLATURE
βa Anodic Tafel slope V/decβc Cathodic Tafel slope V/decρ Density g/mm3
d Corrosion depth cmk LPR sample index –fbp Break-point frequency Hzfmax Maximum test frequency Hzicorr Corrosion current density A/cm2
ipit Pitting current density A/cm2
ipv Passive current density A/cm2
mloss Mass loss due to corrosion gz Number of electrons lost per atom –∆d Corrosion depth uncertainty cm∆mloss Mass loss uncertainty g∆Npit Pit density uncertainty cm−2
Asen Effective sensor area cm2
Aexp Exposed coupon area cm2
AW Atomic Weight g/mol
B Proportionality constant V/decCdl Double-layer capacitance FEa Applied potential VEcorr Corrosion voltage VEW Equivalent weight g/mol
F Faraday’s constant C/mol
Ia Applied current AIdl Scanning current from Cdl AIcorr Corrosion current AM Number of moles reacting molN Total number of µLPR samples –Npit Pit density cm−2
Qcorr Charge from oxidation reaction CRp Polarization resistance Ω
Rp Measured polarization resistance Ω
Rs Solution resistance ΩTs Sampling period sYdl Scanning admittance from Cdl s
REFERENCES
Bockris, J. O., Reddy, A. K. N., & Gambola-Aldeco, M.(2000). Modern Electrochemistry 2A. Fundamentalsof Electrodics (2nd ed.). New York: Kluwer Aca-demic/Plenum Publishers.
Burstein, G. T. (2005, December). A Century of Tafel’s Equa-tion: 1905-2005. Corrosion Science, 47(12), 2858-2870.
D2776, A. S. (1994). Test Methods for Corrosivity of Waterin the Absence of Heat Transfer. Annual Book of ASTMStandards, 03.02.
G102, A. S. (1994). Standard Practice for Calculation ofCorrosion Rates and Related Information from Electro-chemical Measurements. Annual Book of ASTM Stan-dards, 03.02.
G59, A. S. (1994). Standard Practice for Conducting Po-tentiodynamic Polarization Resistance Measurements.Annual Book of ASTM Standards, 03.02.
Huston, D. (2010). Structural Sensing, Health Mon-itoring, and Performance Evaluation (B. Jones &W. B. S. J. Jnr., Eds.). Taylor and Francis.
Mansfeld, F., & Kendig, M. (1981).Corrosion, 37, 545.
Scully, J. R. (2000, February). Polarization ResistanceMethod for Determination of Instantaneous CorrosionRates. Corrosion, 56(2), 199-218.
Wagner, C., & Traud, W. (1938).Elektrochem, 44, 391.
Douglas W. Brown is the senior systems engineer atAnalatom with eight years of experience developing and ma-turing PHM and fault-tolerant control systems in avionics ap-plications. He received the B.S. degree in electrical engi-neering from the Rochester Institute of Technology in 2006and the M.S/Ph.D. degrees in electrical engineering from theGeorgia Institute of Technology in 2008 and 2011, respec-tively. Dr. Brown is a recipient of the National DefenseScience and Engineering Graduate Fellowship and has re-
8
First European Conference of the Prognostics and Health Management Society, 2012
190
European Conference of the Prognostics and Health Management Society, 2012
ceived several best-paper awards in his work in prognosticsand fault-tolerant control.
Duane Darr is the senior embedded hardware engineer atAnalatom with over 30 years of experience in the softwareand firmware engineering fields. He completed his under-graduate work in physics, and graduate work in electrical en-gineering and computer science at Santa Clara University.Mr. Darr’s previous work at Epson Imaging TechnologyCenter, San Jose, California, as Senior Software Engineer,Data Technology Corporation, San Jose, California as Se-nior Firmware Engineer, and Qume Inc., San Jose Califor-nia, as Member of Engineering Staff/Senior Firmware Engi-neer, focused on generation and refinement of software andfirmware solutions for imaging core technologies as well asdigital servo controller research, development, and commer-cialization.
Jefferey Morse is the director of advanced technology atAnalatom since 2007. Prior to this, he was a senior scientistin the Center for Micro and Nano Technology at LawrenceLivermore National Laboratory. He received the B.S. and
M.S. degrees in electrical engineering from the University ofMassachusetts Amherst in 1983 and 1985, respectively, anda Ph.D. in electrical engineering from Stanford University in1992. Dr. Morse has over 40 publications, including 12 jour-nal papers, and 15 patents in the areas of advanced materi-als, nanofabrication, sensors and energy conversion technolo-gies. He has managed numerous projects in various multidis-ciplinary technical areas, including electrochemical sensorsand power sources, vacuum devices, and microfluidic sys-tems.
Bernard Laskowski is the president and senior research sci-entist at Analatom since 1981. He received the Licentiaatand Ph.D. degrees in Physics from the University of Brus-sels in 1969 and 1974, respectively. Dr. Laskowski has pub-lished over 30 papers in international refereed journals in thefields of micro physics and micro chemistry. As presidentof Analatom, Dr. Laskowski managed 93 university, govern-ment, and private industry contracts, receiving a U.S. SmallBusiness Administration Administrator’s Award for Excel-lence.
9
First European Conference of the Prognostics and Health Management Society, 2012
191
Uncertainty of performance requirements for IVHM tools accordingto business targets
Manuel Esperon-Miguez1, Philip John2, and Ian K. Jennions3
1,3 IVHM Centre, Cranfield, Bedfordshire, MK43 0FQ, United Kingdom
[email protected]@cranfield.ac.uk
2 Cranfield University, Cranfield, Bedfordshire, MK43 0AL, United Kingdom
ABSTRACT
Operators and maintainers are faced with the task ofselecting which health monitoring tools are to be acquired ordeveloped in order to increase the availability and reduceoperational costs of a vehicle. Since these decisions willaffect the strength of the business case, choices must bebased on a cost benefit analysis. The methodology presentedhere takes advantage of the historical maintenance dataavailable for legacy platforms to determine the performancerequirements for diagnostic and prognostic tools to achievea certain reduction in maintenance costs and time. Theeffect of these tools on the maintenance process is studiedusing Event Tree Analysis, from which the equations arederived. However, many of the parameters included in theformulas are not constant and tend to vary randomly arounda mean value (e.g.: shipping costs of parts, repair times),introducing uncertainties in the results. As a consequencethe equations are modified to take into account the varianceof all variables. Additionally, the reliability of theinformation generated using diagnostic and prognostic toolscan be affected by multiple characteristics of the fault,which are never exactly the same, meaning the performanceof these tools might not be constant either. To tackle thisissue, formulas to determine the acceptable variance in theperformance of a health monitoring tool are derived underthe assumption that the variables considered followGaussian distributions. An example of the application ofthis methodology using synthetic data is included.
1. INTRODUCTION
The objective of Integrated Vehicle Health Management(IVHM) is to increase platform availability and reducemaintenance costs through the use of health monitoring on
key systems. The information generated using conditionmonitoring algorithms can be used to reduce maintenancetimes, improve the management of the support process andoperate the fleet more efficiently. Although IVHM caninclude the use of tools to improve the management oflogistics, maintenance and operations (Khalak & Tierno,2006), this methodology focuses on diagnostic andprognostic tools.
In order to run the algorithms it is necessary to read a set ofparameters with a given accuracy and enough resolution togenerate trustworthy information for the maintainer.Additionally, the data generated by sensors has to betransmitted, postprocessed, stored and analyzed. Although itis possible to carry out part of this process off-board, legacyvehicles rarely have the sensors, data buses, memory orcomputer power still required on-board. However, legacyplatforms are expensive to modify to accommodate newhardware, especially if the modifications have to becertified. Therefore, it is not always possible to use the besthardware available for every tool and its performance willnot reach its full potential. Furthermore, the implementationof the new health monitoring tools must have the lowestimpact possible on the normal operation of the fleet, aproblem not found in vehicles which are still being designedor manufactured. Thus, health monitoring tools for legacyplatforms have a lower performance, a higher cost and ashorter payback period than if they were used on newvehicles.
On the other hand, the historical maintenance data generatedby fleets provide information that can be used to select thecomponents to retrofit health monitoring tools on, validatediagnostic and prognostic algorithms, and carry out Cost-Benefit Analyses (CBA). This is an important advantagesince the expectations regarding the performance of the tooland their impact on the operational costs and availability aremuch more accurate for legacy platforms. Additionally,FMECAs, which are widely used for the design of health
_____________________Esperon-Miguez et al. This is an open-access article distributed under theterms of the Creative Commons Attribution 3.0 United States License,which permits unrestricted use, distribution, and reproduction in anymedium, provided the original author and source are credited.
First European Conference of the Prognostics and Health Management Society, 2012
192
European Conference of Prognostics and Health Management Society 2012
2
monitoring tools and perform CBAs (Banks, Reichard,Crow and Nickell, 2009; Kacprzynski, Roemer, and Hess,2002; Ashby & Byer, 2002) become easier to populate andmore precise. Even the experience of maintenance personneland operators on qualitative aspects has a huge value for thedevelopment of IVHM tools.
This information can be used to define the performancerequirements of any diagnostic or prognostic tool. Since themain objective of retrofitting IVHM is the reduction ofmaintenance cost and time, these are the constraints used inthe methodology presented here. Teams in charge ofdeveloping health monitoring algorithms need to know notonly the performance expected from their tools, but also thebudget constraints to make them profitable. This data can beused to calculate the performance expected from adiagnostic or prognostic tool if it is to achieve a certainreduction of the cost and downtime associated with themaintenance of component it monitors. It is important tonote that the criticalities of different costs and maintenanceoperations vary for each stakeholder (Wheeler, Kurtogluand Poll, 2009) and depend on whether the vehicle isoperated in a civilian or a military environment (Williams,2006).
In some cases it is possible to generate mathematicalexpressions to relate the return on investment with certaindesign parameters (Kacprzynski et al., 2002; Hoyle, Mehr,Turner, and Chen, 2007; Banks & Merenich, 2007), but thisapproach restricts major changes in the design and theequations are not applicable to other monitoring systems.
Working with historical maintenance data involves usingaverage values of many recorded parameters which arereally random variables. Therefore, there is a certain degreeof uncertainty in any calculation of the performancerequirements which must be taken into account to avoidarriving at overconfident results. Furthermore, the reliabilityof an IVHM tool varies depending on the characteristics ofthe fault, which are different on every occasion, and thistranslates into uncertainty about its performance (Lopez &Sarigul-Klijn, 2010). As a result, the acceptable standarddeviations of the performance parameters of each tool haveto be calculated to ensure the targets are met.
2. PERFORMANCE OF IVHM TOOLS
IVHM is enabled by the use of sensors to gather data of acomponent and those systems that interact with it in order todetect malfunctions – diagnostic tools – or to predict thefailure of the part – prognostic tools. Diagnostic tools helpto identify the component responsible for the malfunction ofa system, reducing the diagnosis and localization times.Additionally, they can prevent the vehicle to continuerunning with an unnoticed fault.
If a diagnostic tool is too sensitive it can trigger false alarmswhich could result in unnecessary checks, waste of
resources and, in some cases, aborting the mission. On theother hand, if the sensitivity is too low and faults are notdetected, the investment on the tool will not produce anybenefits. Therefore, the main performance parameters of adiagnostic tool in an analysis of its effect on maintenancecost and time are the probability of triggering a false alarm,PFA, and the probability of producing a false negative, PFN.
Prognostic tools calculate the RUL of a component at agiven moment providing maintainers with a lead time toaccommodate the replacement or repair of that part in thefuture. If the lead time is long and accurate enough, themaintenance of the component can be carried out along withother scheduled tasks (long-term prognosis). Otherwise, thepart will have to be replaced between missions (short-termprognosis), but this approach is still safer, cheaper and lesstime-consuming than running the component until failure.While long-term prognostic tools enable the deferral of themaintenance action until the next scheduled service, short-term prognostic tools can affect the availability of thevehicle if the time available for maintenance betweenmissions is shorter that the time necessary to repair the fault.
The performance of a prognostic tool is determined by thereliability of the information it provides and how it is used,in other words, by the probability of the component failingbefore it was planned to be replaced (PLP for long-term toolsand PSP for short-term tools). As shown in Figure 1, it isnecessary to define a maximum admissible probability offailure, Pmax, to determine how long the component canremain in service, tmax. This requires choosing a degradationcurve from those generated by the prognostic tool fromwhich tmax is estimated. The probability of the componentfailing is a function of the average life of the componentsremoved, tm, which depends on the period betweenscheduled services (long-term tools) or the mean timebetween missions (short-term tools).
Figure 1. Degradation curves generated by a prognostic toolused to estimate the probability of failure of a component
before it has been replaced.
First European Conference of the Prognostics and Health Management Society, 2012
193
European Conference of Prognostics and Health Management Society 2012
3
3. EVENT TREE ANALYSIS
The failure of a component has a different cost and repairtime depending on whether an IVHM tools has performedits function correctly or not. This can be studied using EventTree Analysis (ETA) where the probability of the failure ofthe component, PF, is the triggering event and each toolintroduces a fork in the diagram as shown in Figure 2. Acorrect prognosis prevents the need for a diagnosis and, if itis incorrect, a diagnostic tool can still be used. For the samereason long-term prognostic tools are further to the left onthe diagram than short-term tools. It is important to remarkthat this is not a representation of the way the algorithmswork, but how the performance of each tool leads todifferent outcomes.
In case a component presents different failure modes thatneed to be monitored by different tools, costs anddowntimes need to be estimated independently for eachmode. This is not a problem since most algorithms fordiagnostic and prognostic tools track specific failure modes.
The tree shows six possible outcomes or maintenancescenarios, including the lack of need to replace a healthycomponent. Maintenance cost and time are calculated foreach scenario according to how the use (or malfunction) of ahealth monitoring tool affects maintenance process. In casea prognostic tool is used, it is necessary to take into accountfactors such as the reduction of the delays, the value of theRUL of the component, the lower operational for costs onscheduled operations, and the avoidance of secondaryfailures. The use of diagnostic tools can help to reduce themaintenance time as well as the use of resources andpersonnel since searching for the cause of the malfunction isno longer necessary. However, false alarms, or falsepositives, can lead to unnecessary checks or even theremoval of healthy components which could be disposed of(Trichy, Sandborn, Raghavan and Sahasrabudhe, 2001).Techniques necessary to calculate some of these parameterswere described by Leao, Fitzgibbon, Puttini and de Melo(2008) as well as Prabhakar and Sandborn (2010.)
Since the event tree can be used to calculate the probabilityof each outcome, the resulting total maintenance cost, C,and time, T, can be calculated using the followingexpressions:
ி ௌ ௌ
ௌ ிே ிே ிே ி ி ி (1)
ி ௌ ௌ
ௌ ிே ிே ிே ி ி ி(2)
These polynomial functions can be used to calculate thesensitivities of the maintenance cost and time to theperformance of health monitoring tools. Additionally, it
must be noted that the data used to calculate the cost anddowntime of each scenario are not constant and vary aroundaverage values (e.g.: time to repair or shipping costs), andthese equations can be used as the basis to calculate thestandard deviation of the resulting maintenance costs andtimes.
Detectability with IVHM
Cost TimeLong TermPrognosis
Short TermPrognosis
Diagnosis
1-PLP CLP tLPSUCCESS
PF 1-PSPCSP tSPPLP SUCCESS
FAILURE 1-PFNCD tD
PSP SUCCESS
FAILURE PFN CFN tFNFAILURE
1-PFA0 0
1-PF SUCCESS
PFA CFA tFAFAILURE
Figure 2. ETA for the use of health monitoring tools on asingle component.
4. PERFORMANCE REQUIREMENTS WITH EXACT DATA
The performance of an IVHM tool must guarantee that themaintenance cost and time associated with the component itmonitors are below C* and T* respectively.
Prognostic tools can be used to monitor a system whichalready has some diagnostic capability in order to combinethe benefits from estimating its RUL and being able toidentify the source of a malfunction if the component failsbefore it was expected. However, it is difficult to imaginedeveloping a diagnostic algorithm for a part which is nolonger run until failure thanks to the use of prognostics.Therefore, the equations for the probability of false negativeand false alarm only take into consideration the parametersof scenarios in which diagnostic tools are used.
∗ி ிே ிே ிே ி ி ி
(3)
∗ி ிே ிே ிே ி ி ி
(4)
ி ிே(5;6)
ி
∗ி ிே ிே ிே
ி ி
(7)
ி
∗ி ிே ிே ிே
ி ி
(8)
First European Conference of the Prognostics and Health Management Society, 2012
194
European Conference of Prognostics and Health Management Society 2012
4
Equations (5-8) define a space which encloses all thepossible solutions that comply with the requirements. Thisspace can be represented as sown in Figure 3.
The following expressions can be used to determine theprobability of failure of a long-term prognostic tool giventime and cost constraints. The equations for short-term toolare obtained the same way.
∗ி ிே ிே ிே
ி ி ி(9)
∗ி ிே ிே ிே
ி ி ி(10)
(11)
∗ி ி ி
ி
ிே ிே ிே
(12)
∗ி ி ி
ி
ிே ிே ிே
(13)
Since the system is overdetermined the most stringentsolution must be selected.
5. UNCERTAINTY
Most parameters used to perform a CBA are not constantsince the conditions under which each job is carried out aredifferent. Costs of personnel and parts can changedepending on the location or the shift. Active maintenancetimes, delays and the time dedicated to the diagnosis andlocalization of a fault are never exactly the same.Consequently, the variables used to define a maintenanceactivity are approximated to average values. This also
affects the frequency of failure of the component, which isapproximated to the Mean Time Between Failures (MTBF)for most quantitative analyses despite being extremelyvariable for those components that can benefit the mostfrom IVHM. Additionally, the performance of healthmonitoring tools over a fixed period can also vary,increasing the uncertainty of the cost and downtimecalculated in the previous sections.
Although the total maintenance time dedicated to a singlecomponent can be broken down into several steps includingdelays, repair time and checkout time (British Standard,1991), they tend to be poorly recorded. Since the wholeprocess involves different teams, it is difficult to keep trackof the exact amount of time dedicated to each component(especially for delays and diagnosis). In addition,technicians tend to focus on the task in hand and registerapproximate values once the job is finished.
Therefore, there are uncertainties associated with the resultsfrom a CBA and this affects the definition of theperformance requirements for IVHM tools. To avoidoverstating the benefits from using diagnostic andprognostic tools it is necessary to include the standarddeviation of every parameter that does not remain constant.It is also necessary to determine the acceptable standarddeviation for the performance of the algorithms to ensurethe maintenance costs and times will remain belowacceptable levels.
Taking into account the effects of uncertainties means thatfor every performance parameter aforementioned anadditional variable has to be calculated. At the same time, itis necessary to define the probability of the maintenancecost and downtime being bellow the limits imposed; in otherwords: how confident we are that the costs and times willremain below limits. As a consequence, two additionalconstraints are introduced: confidence to comply with costrequirements, RC; and confidence to comply with timerequirements, RT.
The maintenance costs and times of different scenarios canbe considered independent since numerous factors includedin their calculation are random and uncorrelated. Theseassumptions allows for analytical expression to beformulated using the standard deviation of such randomfactors. In order to simplify mathematical operationsvariance is used instead of standard deviation. Therefore,the following properties apply:
(14)
ଶ ଶ (15)
Since the variations in costs and maintenance times are dueto numerous random factors, it has been assumed that boththe total maintenance time and total maintenance cost percomponent follow Gaussian distributions.
Figure 3. Region of acceptable performance of adiagnostic tool
PFN
PFA
Cost constraints
Time Constraints
Region of possiblesolutions
First European Conference of the Prognostics and Health Management Society, 2012
195
European Conference of Prognostics and Health Management Society 2012
5
Diagnostic tools are now defined by four parameters:probability of false alarm, PFA; probability of false negative,PFN; and their variances, Var(PFA) and Var(PFN)respectively. The limits of these variables are defined by thefollowing functions:
∗
(16)
∗
(17)
ி ிே (18)
Where
ிே ி ிே ி ி ி ி (19)
ிே ி ிே ி ி ி ி (20)
ிே ி ிே ி
ி ி ி
(21)
ிே ி ிே ி
ி ி ி
(22)
From equation (16)
∗ ଶ
ଵ
ଶ
(23)
Additionally
ଵ ிே ଶ ி ଷ (24)
where
ଵ ிଶ
ிே
ଶ
ி ிே (25)
ଶ ி
ଶ
ிଶ
ி ி (26)
ଷ ிேଶ
ி ிே
ிଶ
ி ி ி (27)
As a result
ଵ ிே ଶ ி
∗ிே ி ிே ி ி ி ி
ଶ
ଵ
ଶ ଷ
(28)
Following the same steps for the maintenance timerequirements from equation (17), the second condition is
ସ ிே ହ ி
∗ிே ி ிே ி ி ி ி
ଶ
ଵ
ଶ
(29)
where
ସ ிଶ
ிே ଶ
ி ிே (30)
ହ ி
ଶ
ிଶ
ி ி (31)
ிேଶ
ி ிே ிଶ
ி ி
ி (32)
Therefore, any diagnostic tool that satisfies the requirementsand can generate the projected savings with the expectedaccuracy must comply with equations (18), (28), and (29).
Prognostic tools are now defined by the probability of thecomponent failing before it is replaced and its variance. Thefollowing formulas define the constraints for a prognostictool to comply with the cost and support requirements. Tokeep the equations manageable, the parameters of diagnostictools are not included. In case they were necessary the fullequations can be obtained in a similar manner. As fordiagnostic tools:
∗
(33)
∗
(34)
The difference being
(35)
ி ிே ி (36)
ி ிே ி (37)
ி ிே ி (38)
ி ிே ி (39)
From equation (33)
∗ ଶ
ଵ
ଶ
(40)
Combining equations (37), (38) and (40)
ி ிே
∗ி ிே ி
ଶ
ଵ
ଶ ி (41)
Using the properties described in equations (14) and (15)and following the same steps with the equations formaintenance time constraints the results are:
First European Conference of the Prognostics and Health Management Society, 2012
196
∗ி ிே ி
ଶ
ଵ
ଶ ி ଶ
ி ிே
ிଶ
ிே
ଶ
ி ிே
(42)
∗ி ிே ி
ଶ
ଵ
ଶ ி ଶ
ி ிே
ிଶ
ிே ଶ
ி ிே
(43)
These parabolas define the limits for the performancerequirements of any prognostic tool as shown in Figure 4.These expressions are for long-term prognostic tools. Toobtain the formulas for short term tools replace CLP and tLP
by CST and tLP respectively.
These formulas can be applied to any component of avehicle to quantify the performance requirements forcontinuous monitoring tools. These requirements will bethen communicated to the internal teams in charge ofdeveloping IVHM tools, the supplier of the component,independent developers of health monitoring technology oreven can be used to call an open tender. Since theperformance parameters are determined based on economicobjectives, it is possible to calculate the maximumacceptable cost for each tool based on the remaining usefullife of the fleet.
Additionally, this set of equations presents a framework toinclude risk analysis on a CBA and strengthen the businesscase for installing IVHM on the aircraft.
6. CASE STUDY
The following example is based on synthetic data for ageneric component that fails every 250 flying hours.Although the values chosen for the parameters used in thiscase do not belong to a specific real component, they arerepresentative of the costs and maintenance times of many
parts currently run until failure. All the factors taken intoaccount to calculate the maintenance cost and time of eachscenario, as well as their values, are listed in Table 1.Standards deviations were chosen to ensure the uncertaintieswould vary between ±5% and ±20% (assuming allparameters follow Gaussian distributions so 99.7% of theoutcomes are within ±3σ from the mean). The results foreach scenario are shown in Figure 5.
The objective is to reduce the maintenance costs per flyinghour for this component by 15% and the maintenance timeby 40%. These goals must be met with, at least, 95%confidence. As a result the performance requirements forlong and short term prognostic tools are shown in Figure 6.
Since the performance of diagnostic tools is described byfour variables it is not possible to represent the limits of therequirements. To provide some guidance, the graphs fordiagnostic tools shown in Figure 6c represent the relationbetween the probability of false alarm and the probability offalse negative, assuming there is no uncertainty about theperformance of the tool (i.e.: zero variance). To check if theperformance of a given tool complies with the requirementsit is necessary to use the equations previously shown.
Detectability with IVHMCost (£) Time (h)L-T
PrognosisS-T
PrognosisDiagnosis
1-PLP 773.5[2.95E+02]
1.35[9.00E-04]S
PF 1-PSP 906.1[1.88E+02]
1.35[9.00E-04]PLP S
F 1-PFN 1021.7[1.86E+02]
1.35[3.16E-03]PSP S
F PFN 1319.825[3.10E+02]
3.375[6.46E-03]F
1-PFA 0 01-PF S
PFA 330[3.03E+01]
2[2.27E-03]F
Total5.279
[6.82E-02]0.0135
[5.17E-07]
Figure 5. Costs, times and their variances (in brackets) foreach maintenance scenario.
PLP
Var(PLP)
Cost constraints
Time Constraints
Region of possible solutions
Figure 4. Region of acceptable performance andvariance of performance of a long-term prognostic tool
First European Conference of the Prognostics and Health Management Society, 2012
197
a)
b)
c)
Figure 6. Graphs for possible solutions for a) long-term and b) short term prognostic tools and c) diagnostic tools.
0
0.02
0.04
0.06
0.08
0.1
0.12
0.140
0.0
8
0.1
6
0.2
4
0.3
2
0.4
0.4
8
0.5
6
0.6
4
0.7
2
0.8
0.8
8
0.9
6
Var
(PLP
)
PLP
Constraints for LT prognostic tools
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0 0.2 0.4 0.6 0.8 1
Var
(PLP
)
PLP
Var(PLT) (cost) Var (PLT) (time)
0
0.01
0.02
0.03
0.04
0.05
0.06
0
0.0
8
0.1
6
0.2
4
0.3
2
0.4
0.4
8
0.5
6
0.6
4
0.7
2
0.8
0.8
8
0.9
6
Var
(PSP
)
PSP
Constraints for ST prognostic tools
00.010.020.030.040.050.060.070.080.09
0.1
0 0.2 0.4 0.6 0.8 1
Var
(PSP
)
PSP
Var(PLT) (cost) Var (PLT) (time)
0.0E+0
5.0E-5
1.0E-4
1.5E-4
2.0E-4
2.5E-4
0
0.0
8
0.1
6
0.2
4
0.3
2
0.4
0.4
8
0.5
6
0.6
4
0.7
2
0.8
0.8
8
0.9
6
PFA
PFN
Constraints for diagnostic tools(no varaiance of PFA or PFN)
0.0E+0
2.0E-4
4.0E-4
6.0E-4
8.0E-4
1.0E-3
1.2E-3
0 0.2 0.4 0.6 0.8 1
PFA
PFN
PFA (time) PFA (cost)
First European Conference of the Prognostics and Health Management Society, 2012
198
a) b)
Figure 7. PDF of maintenance a) cost and b) time for the different IVHM tools proposed.
The probability density functions (PDFs) of the newmaintenance cost and time are calculated and compared tothe targets to verify if a diagnostic tool with a givenperformance is capable of achieving the necessaryimprovements. Figure 7 shows the PDF for three possibleIVHM tools (one of each kind) that reach the targetscompared to the original distributions. It also illustrates howchanging the probabilities of different maintenancescenarios, with different variances, affects the standarddeviation of the final maintenance cost and time, which canbe reduced (diagnostic tool) or increased (long termprognostic tool.)
Only the shaded area on left side of the graphs comprisesthose tools that achieve the expected reduction in cost anddowntime. The area on the right is for those which matchthe requirements with a confidence complimentary to whatis expected (i.e.: 5%) as illustrated in Figure 8.
The requirements for diagnostic and short term prognostictools illustrate an interesting phenomenon: in some casesone of the targets can result in any possible solutionoverperforming in other areas. In this example a diagnostictool that barely reaches the expected cost reduction willimprove maintenance times by much more than it isrequired. The opposite happens to short term prognostictools.
PF 0.004Cost ofcomponent (£)
Scheduled M. 525Unscheduled M. 628.9False Alarm 65
Cost of Labor(£)
Scheduled M. 90Unscheduled M. 132.5
Value of RUL(£)
Long Term Prog 68.5Short Term Prog 12.2
Other costs(£)
Compensation 0Secondary damage 127.8Flight Test 0Loss Income 0
Warranty Parts (%) 0Labor (%) 0
Time (h) MTTR 2Check-out 0.25MTTD 2Localization 0.25Technical delay 0.33Administrative delay 1Logistic delay 0
Table 1. List of parameters used in case study and theirvalues.
0.00
0.50
1.00
1.50
2.00
2.50
0.00 2.00 4.00 6.00
Maintenance cost per flying hour (£/h)
PDF of maintenance cost
Current Cost (pdf) Cost w LTP (pdf)
LIMIT Cost w STP (pdf)
Cost w Diag (pdf)
0.000
200.000
400.000
600.000
800.000
1000.000
1200.000
1400.000
0.000 0.005 0.010 0.015 0.020
Maintenance time per flyig hour (h/h)
PDF of maintenance time
Current Downtime (pdf) Downtime w LTP (pdf)
LIMIT Downtime w STP (pdf)
Downtime w Diag (pdf)
First European Conference of the Prognostics and Health Management Society, 2012
199
European Conference of Prognostics and Health Management Society 2012
9
7. CONCLUSIONS
This methodology represents a reliable way to define therequirements of individual tools based on the expectationsof improving the maintenance of specific components andthe uncertainty of the available data. Since the equationsallow to carry out a quantitative risk analysis, business casesthat use this methodology are more robust and less likely tooverstate the benefits of installing the selected combinationof IVHM tools.
It is not always possible to obtain reliable data to determinethe standard deviation or variance of some of the variablesused to calculate the costs or maintenance times. In somecases these variables are poorly recorded or not recorded atall. To tackle this problem, personnel with experiencemaintaining the aircraft should be interviewed to getapproximated values. This will always be a better optionthan ignoring the effect of these uncertainties.
Quantifying the uncertainty of the expected revenue iscritical to estimate the present value of an investment onIVHM technology given its long return period. For thatpurpose, techniques like real options can be combined withthe methodology presented here.
IVHM tools can affect the uncertainty, or standarddeviation, of the resulting maintenance costs and timessignificantly, either reducing it or increasing it. Since thepredictability of these factors is sometime as important asdecreasing their value, this effect must be analyzed carefullyin a CBA.
Further work is necessary to study how the diagnoses andprognoses from several algorithms interact. If this newinformation enables grouping maintenance activities the
total downtime can be reduced, increasing the availability ofthe vehicle and generating additional savings.
ACKNOWLEDGEMENT
This work has been supported by the IVHM Centre atCranfield University. The authors also want to thank thepartners of the IVHM Centre for their support in thisproject.
NOMENCLATURE
C Maintenance cost of component per flying hourC* Target cost per flying hourCD Maintenance cost of an effective automated
diagnosisCFA Maintenance cost of a false alarmCFN Maintenance cost of a false negativeCLP Maintenance cost of an effective long term
prognosisCSP Maintenance cost of an effective short term
prognosisPF Probability of failure of the component per flying
hourPFA Probability of false alarmPFN Probability of false negativePLP Probability of long term prognosis being
ineffectivePSP Probability of short term prognosis being
ineffectiveRC Expected confidence to comply with cost
requirementsRT Expected confidence to comply with time
requirementsT Maintenance time of component per flying hourT* Target maintenance time per flying hourtD Maintenance time of an effective automated
diagnosistFA Maintenance time of a false alarmtFN Maintenance time of a false negativetLP Maintenance time of an effective long term
prognosistm Average life of components replaced following the
indication of a prognostic tooltmax Maximum time a component is run before its
probability of failure reaches a predetermined limittSP Maintenance time of an effective short term
prognosis
REFERENCES
Ashby, M. J. and Byer, R. J. (2002), "An approach forconducting a cost benefit analysis of aircraft engineprognostics and health management functions",Aerospace Conference Proceedings, 2002. IEEE, Vol.6, pp. 6-2847.
Figure 8. Region of acceptable performance and varianceof performance of a long-term prognostic tool
P
Var(P)
1
A1-A
First European Conference of the Prognostics and Health Management Society, 2012
200
Ban
Banks, J., Reichard, K., Crow, E. and Nickell, K. (2009),
British Standard, (
Hoyle, C., Mehr
Kacprzynski, G. J., Roemer, M. J. and Hess, A. J. (2002),
Khalak, A. and Tierno, J. (2006), "Influence of prognostic
Leao, B.
Lopez, I. and Sarigul
Prabhakar, V. J. and Sandborn, P. (2010), "A part total cost
Trichy, T., Sandborn, P., Raghavan, R. and Sahasrabudhe,
Wheeler, K., Kurtoglu, T. and Poll, S. (2009), "A Survey of
Williams, Z. (2006), "Benefits of I
B
Banks, J. and Merenich, J. (2007), "Cost Benefit Analysisfor Asset Health Management Technology",and Maintainability Symposium, 2007. RAMS '07.Annual, pp. 95.
Banks, J., Reichard, K., Crow, E. and Nickell, K. (2009),"How engineers can conductPHM systems",Magazine, IEEE,
British Standard, (vocabularymaintainability terms
Hoyle, C., MehrQuantifying CostSystems",
Kacprzynski, G. J., Roemer, M. J. and Hess, A. J. (2002),"Health management system design: Development,simulation aConference Proceedings, 2002. IEEE,3065.
Khalak, A. and Tierno, J. (2006), "Influence of prognostichealth management on logistic supply chain",Control Conference, 2006,
Leao, B. P., Fitzgibbon, K. T., Puttini, L. C. and de Melo,G. P. B. (2008), "Costfor PHM Applied to Legacy Commercial Aircraft",Aerospace Conference, 2008 IEEE,
Lopez, I. and Sariguluncertaintmonitoring, diagnosis and control: Challenges andopportunities",46, no. 7, pp. 247
Prabhakar, V. J. and Sandborn, P. (2010), "A part total costof ownership model for long lsystems",Manufacturing,
Trichy, T., Sandborn, P., Raghavan, R. and Sahasrabudhe,S. (2001), "A new test/diagnosis/rework model for usein technical cost modeling of electronicassembly",International,
Wheeler, K., Kurtoglu, T. and Poll, S. (2009), "A Survey ofHealth Management User Objectives Related toDiagnostic and Prognostic Metrics"
Williams, Z. (2006), "Benefits of Iapproach",pp.
BIOGRAPHIES
ks, J. and Merenich, J. (2007), "Cost Benefit Analysisfor Asset Health Management Technology",and Maintainability Symposium, 2007. RAMS '07.
pp. 95.Banks, J., Reichard, K., Crow, E. and Nickell, K. (2009),
"How engineers can conductPHM systems", Aerospace and Electronic SystemsMagazine, IEEE, vol. 24, no. 3, pp. 22
British Standard, (1991),vocabulary - Part 3 Availability, reliability andmaintainability terms
Hoyle, C., Mehr, A., Turner, I. and Chen, W. (2007), "OnQuantifying Cost-Benefit of ISHM in AerospaceSystems", Aerospace Conference, 2007 IEEE,
Kacprzynski, G. J., Roemer, M. J. and Hess, A. J. (2002),"Health management system design: Development,simulation and cost/benefit optimization",Conference Proceedings, 2002. IEEE,
Khalak, A. and Tierno, J. (2006), "Influence of prognostichealth management on logistic supply chain",Control Conference, 2006,
P., Fitzgibbon, K. T., Puttini, L. C. and de Melo,G. P. B. (2008), "Costfor PHM Applied to Legacy Commercial Aircraft",Aerospace Conference, 2008 IEEE,
Lopez, I. and Sarigul-Klijn, N. (2010), "A review ofuncertainty in flight vehicle structural damagemonitoring, diagnosis and control: Challenges andopportunities", Progress in Aerospace Sciences,46, no. 7, pp. 247-273.
Prabhakar, V. J. and Sandborn, P. (2010), "A part total costof ownership model for long lsystems", International Journal of Computer IntegratedManufacturing, , pp. 1-
Trichy, T., Sandborn, P., Raghavan, R. and Sahasrabudhe,S. (2001), "A new test/diagnosis/rework model for usein technical cost modeling of electronicassembly", Test Conference, 2001. Proceedings.International, pp. 1108.
Wheeler, K., Kurtoglu, T. and Poll, S. (2009), "A Survey ofHealth Management User Objectives Related to
stic and Prognostic Metrics"Williams, Z. (2006), "Benefits of I
approach", Aerospace Conference, 2006 IEEE,
IOGRAPHIES
Manuel Esperonresearchinglegacy platforms at Cranfield IVHMCentre since 2010.worked on R&D fordevices and their implementation on landvehicles. He holds a Master in Mechanical
European Conference of Prognostics and Health Management
ks, J. and Merenich, J. (2007), "Cost Benefit Analysisfor Asset Health Management Technology",and Maintainability Symposium, 2007. RAMS '07.
Banks, J., Reichard, K., Crow, E. and Nickell, K. (2009),"How engineers can conduct cost-benefit analysis for
Aerospace and Electronic Systemsvol. 24, no. 3, pp. 221991), BS 4778-
Part 3 Availability, reliability and
, A., Turner, I. and Chen, W. (2007), "OnBenefit of ISHM in Aerospace
Aerospace Conference, 2007 IEEE,Kacprzynski, G. J., Roemer, M. J. and Hess, A. J. (2002),
"Health management system design: Development,nd cost/benefit optimization",
Conference Proceedings, 2002. IEEE,
Khalak, A. and Tierno, J. (2006), "Influence of prognostichealth management on logistic supply chain",Control Conference, 2006, pp. 6 pp.
P., Fitzgibbon, K. T., Puttini, L. C. and de Melo,G. P. B. (2008), "Cost-Benefit Analysis Methodologyfor PHM Applied to Legacy Commercial Aircraft",Aerospace Conference, 2008 IEEE, pp. 1.
Klijn, N. (2010), "A review ofy in flight vehicle structural damage
monitoring, diagnosis and control: Challenges andProgress in Aerospace Sciences,
273.Prabhakar, V. J. and Sandborn, P. (2010), "A part total cost
of ownership model for long life cycle electronicInternational Journal of Computer Integrated
-14.Trichy, T., Sandborn, P., Raghavan, R. and Sahasrabudhe,
S. (2001), "A new test/diagnosis/rework model for usein technical cost modeling of electronic
Test Conference, 2001. Proceedings.pp. 1108.
Wheeler, K., Kurtoglu, T. and Poll, S. (2009), "A Survey ofHealth Management User Objectives Related to
stic and Prognostic Metrics"Williams, Z. (2006), "Benefits of IVHM: an analytical
Aerospace Conference, 2006 IEEE,
Manuel Esperon-Miguezresearching on retrofitting IVHM tools onlegacy platforms at Cranfield IVHMCentre since 2010.worked on R&D for highdevices and their implementation on land
He holds a Master in Mechanical
European Conference of Prognostics and Health Management
ks, J. and Merenich, J. (2007), "Cost Benefit Analysisfor Asset Health Management Technology", Reliabilityand Maintainability Symposium, 2007. RAMS '07.
Banks, J., Reichard, K., Crow, E. and Nickell, K. (2009),benefit analysis for
Aerospace and Electronic Systemsvol. 24, no. 3, pp. 22-30.
-3.1:1991 QualityPart 3 Availability, reliability and
, A., Turner, I. and Chen, W. (2007), "OnBenefit of ISHM in Aerospace
Aerospace Conference, 2007 IEEE, pp. 1.Kacprzynski, G. J., Roemer, M. J. and Hess, A. J. (2002),
"Health management system design: Development,nd cost/benefit optimization", Aerospace
Conference Proceedings, 2002. IEEE, Vol. 6, pp. 6
Khalak, A. and Tierno, J. (2006), "Influence of prognostichealth management on logistic supply chain", American
pp. 6 pp.P., Fitzgibbon, K. T., Puttini, L. C. and de Melo,
Benefit Analysis Methodologyfor PHM Applied to Legacy Commercial Aircraft",
pp. 1.Klijn, N. (2010), "A review of
y in flight vehicle structural damagemonitoring, diagnosis and control: Challenges and
Progress in Aerospace Sciences,
Prabhakar, V. J. and Sandborn, P. (2010), "A part total costife cycle electronic
International Journal of Computer Integrated
Trichy, T., Sandborn, P., Raghavan, R. and Sahasrabudhe,S. (2001), "A new test/diagnosis/rework model for usein technical cost modeling of electronic systems
Test Conference, 2001. Proceedings.
Wheeler, K., Kurtoglu, T. and Poll, S. (2009), "A Survey ofHealth Management User Objectives Related to
VHM: an analyticalAerospace Conference, 2006 IEEE,
Miguez has beenon retrofitting IVHM tools on
legacy platforms at Cranfield IVHMCentre since 2010. Manuel has also
high energy storagedevices and their implementation on land
He holds a Master in Mechanical
European Conference of Prognostics and Health Management
ks, J. and Merenich, J. (2007), "Cost Benefit AnalysisReliability
and Maintainability Symposium, 2007. RAMS '07.
Banks, J., Reichard, K., Crow, E. and Nickell, K. (2009),benefit analysis for
Aerospace and Electronic Systems
3.1:1991 QualityPart 3 Availability, reliability and
, A., Turner, I. and Chen, W. (2007), "OnBenefit of ISHM in Aerospace
pp. 1.Kacprzynski, G. J., Roemer, M. J. and Hess, A. J. (2002),
"Health management system design: Development,Aerospace
Vol. 6, pp. 6-
Khalak, A. and Tierno, J. (2006), "Influence of prognosticAmerican
P., Fitzgibbon, K. T., Puttini, L. C. and de Melo,Benefit Analysis Methodology
for PHM Applied to Legacy Commercial Aircraft",
Klijn, N. (2010), "A review ofy in flight vehicle structural damage
monitoring, diagnosis and control: Challenges andProgress in Aerospace Sciences, vol.
Prabhakar, V. J. and Sandborn, P. (2010), "A part total costife cycle electronic
International Journal of Computer Integrated
Trichy, T., Sandborn, P., Raghavan, R. and Sahasrabudhe,S. (2001), "A new test/diagnosis/rework model for use
systemsTest Conference, 2001. Proceedings.
Wheeler, K., Kurtoglu, T. and Poll, S. (2009), "A Survey ofHealth Management User Objectives Related to
VHM: an analyticalAerospace Conference, 2006 IEEE, pp. 9
has beenon retrofitting IVHM tools on
legacy platforms at Cranfield IVHMManuel has also
ergy storagedevices and their implementation on land
He holds a Master in Mechanical
Engineeringand an MSc in Aerospace Engineering from BrunelUniversity, UK. He is currently pursuing a PhD in IVHM atCranfield University
systems engineering and management roles, including Headof Systems Engineering for a major multinational company.His experience and responsibilities in industry encompassedthe whole scope of systems engineeringRequirements Engineering, System Design, ILS, ARM,Human Factors, Safety, Systems Proving & Simulation andModelling. He is a member of several National AdvisoryCommittees and Industrial Steering Boards and served asthe President of the InterEngineering (INCOSE) in the UK from 2003 to 2004. Hisresearch interests include: Understanding Complex Systemsand Systems of Systems (SoS); Managing ComplexSystems Projects and Risks; Through Life CapabilityManagement; and CSystems
technical roles, gaining experience in aerodynamics, heattransfer, fluid systems, mechanical design, combustion,services and IVHM. He moved to Cranfield in July 2008 asProfessor and DThe Centre is funded by a number of industrial companies,including Boeing, BAe Systems, RollsMeggitt, MOD and Alstom Transport. He has led thedevelopment and growth of the Centre, in research andeducation, over the last three years. The Centre offers ashort course in IVHM and the world’s first IVHM MSc,begun in 2011.
Ian is on the editorial Board for the International Journal ofCondition Monitoring, a Director of the PHM Society,contributingHMASME. He is the editor of the recent SAE book: IVHMPerspectives on an Emerging Field.
European Conference of Prognostics and Health Management
Engineering from Madrid Polytechnic Universityand an MSc in Aerospace Engineering from BrunelUniversity, UK. He is currently pursuing a PhD in IVHM atCranfield University
systems engineering and management roles, including Headof Systems Engineering for a major multinational company.His experience and responsibilities in industry encompassedthe whole scope of systems engineeringRequirements Engineering, System Design, ILS, ARM,Human Factors, Safety, Systems Proving & Simulation andModelling. He is a member of several National AdvisoryCommittees and Industrial Steering Boards and served asthe President of the InterEngineering (INCOSE) in the UK from 2003 to 2004. Hisresearch interests include: Understanding Complex Systemsand Systems of Systems (SoS); Managing ComplexSystems Projects and Risks; Through Life CapabilityManagement; and CSystems
Ianyears, working mostly for a variety of gasturbine companies. He has a MechanicalEngineering degree and a PhD in CFD bothfrom Imperial College, London. He hasworkeElectric and Alstom in a number of
technical roles, gaining experience in aerodynamics, heattransfer, fluid systems, mechanical design, combustion,services and IVHM. He moved to Cranfield in July 2008 asProfessor and Director of the newly formed IVHM Centre.The Centre is funded by a number of industrial companies,including Boeing, BAe Systems, RollsMeggitt, MOD and Alstom Transport. He has led thedevelopment and growth of the Centre, in research andducation, over the last three years. The Centre offers a
short course in IVHM and the world’s first IVHM MSc,begun in 2011.
Ian is on the editorial Board for the International Journal ofCondition Monitoring, a Director of the PHM Society,contributing member of the SAE IVHM Steering Group andHM-1 IVHM committee, a Fellow of IMechE, RAeS andASME. He is the editor of the recent SAE book: IVHMPerspectives on an Emerging Field.
European Conference of Prognostics and Health Management Society 2012
from Madrid Polytechnic Universityand an MSc in Aerospace Engineering from BrunelUniversity, UK. He is currently pursuing a PhD in IVHM atCranfield University.
Philip JohnEngineering at Cranfield University inthe UK and has been the University'sProfessor of Systems Engijoining in 1999.Imperial College, London he spent 18years in industry, holding a wide range of
systems engineering and management roles, including Headof Systems Engineering for a major multinational company.His experience and responsibilities in industry encompassedthe whole scope of systems engineeringRequirements Engineering, System Design, ILS, ARM,Human Factors, Safety, Systems Proving & Simulation andModelling. He is a member of several National AdvisoryCommittees and Industrial Steering Boards and served asthe President of the International Council on SystemsEngineering (INCOSE) in the UK from 2003 to 2004. Hisresearch interests include: Understanding Complex Systemsand Systems of Systems (SoS); Managing ComplexSystems Projects and Risks; Through Life CapabilityManagement; and Coping with Uncertainty and Change in
Ian K. Jennionsyears, working mostly for a variety of gasturbine companies. He has a MechanicalEngineering degree and a PhD in CFD bothfrom Imperial College, London. He hasworked for RollsElectric and Alstom in a number of
technical roles, gaining experience in aerodynamics, heattransfer, fluid systems, mechanical design, combustion,services and IVHM. He moved to Cranfield in July 2008 as
irector of the newly formed IVHM Centre.The Centre is funded by a number of industrial companies,including Boeing, BAe Systems, RollsMeggitt, MOD and Alstom Transport. He has led thedevelopment and growth of the Centre, in research andducation, over the last three years. The Centre offers a
short course in IVHM and the world’s first IVHM MSc,
Ian is on the editorial Board for the International Journal ofCondition Monitoring, a Director of the PHM Society,
member of the SAE IVHM Steering Group and1 IVHM committee, a Fellow of IMechE, RAeS and
ASME. He is the editor of the recent SAE book: IVHMPerspectives on an Emerging Field.
y 2012
from Madrid Polytechnic Universityand an MSc in Aerospace Engineering from BrunelUniversity, UK. He is currently pursuing a PhD in IVHM at
Philip John is the Head of the School ofEngineering at Cranfield University inthe UK and has been the University'sProfessor of Systems Engijoining in 1999. Following his PhD atImperial College, London he spent 18years in industry, holding a wide range of
systems engineering and management roles, including Headof Systems Engineering for a major multinational company.His experience and responsibilities in industry encompassedthe whole scope of systems engineeringRequirements Engineering, System Design, ILS, ARM,Human Factors, Safety, Systems Proving & Simulation andModelling. He is a member of several National AdvisoryCommittees and Industrial Steering Boards and served as
national Council on SystemsEngineering (INCOSE) in the UK from 2003 to 2004. Hisresearch interests include: Understanding Complex Systemsand Systems of Systems (SoS); Managing ComplexSystems Projects and Risks; Through Life Capability
oping with Uncertainty and Change in
Jennions. Ian’s career spans over 30years, working mostly for a variety of gasturbine companies. He has a MechanicalEngineering degree and a PhD in CFD bothfrom Imperial College, London. He has
d for Rolls-Royce (twice), GeneralElectric and Alstom in a number of
technical roles, gaining experience in aerodynamics, heattransfer, fluid systems, mechanical design, combustion,services and IVHM. He moved to Cranfield in July 2008 as
irector of the newly formed IVHM Centre.The Centre is funded by a number of industrial companies,including Boeing, BAe Systems, Rolls-Meggitt, MOD and Alstom Transport. He has led thedevelopment and growth of the Centre, in research andducation, over the last three years. The Centre offers a
short course in IVHM and the world’s first IVHM MSc,
Ian is on the editorial Board for the International Journal ofCondition Monitoring, a Director of the PHM Society,
member of the SAE IVHM Steering Group and1 IVHM committee, a Fellow of IMechE, RAeS and
ASME. He is the editor of the recent SAE book: IVHMPerspectives on an Emerging Field.
from Madrid Polytechnic University, Spain,and an MSc in Aerospace Engineering from BrunelUniversity, UK. He is currently pursuing a PhD in IVHM at
is the Head of the School ofEngineering at Cranfield University inthe UK and has been the University'sProfessor of Systems Engineering since
Following his PhD atImperial College, London he spent 18years in industry, holding a wide range of
systems engineering and management roles, including Headof Systems Engineering for a major multinational company.His experience and responsibilities in industry encompassedthe whole scope of systems engineering, includingRequirements Engineering, System Design, ILS, ARM,Human Factors, Safety, Systems Proving & Simulation andModelling. He is a member of several National AdvisoryCommittees and Industrial Steering Boards and served as
national Council on SystemsEngineering (INCOSE) in the UK from 2003 to 2004. Hisresearch interests include: Understanding Complex Systemsand Systems of Systems (SoS); Managing ComplexSystems Projects and Risks; Through Life Capability
oping with Uncertainty and Change in
career spans over 30years, working mostly for a variety of gasturbine companies. He has a MechanicalEngineering degree and a PhD in CFD bothfrom Imperial College, London. He has
Royce (twice), GeneralElectric and Alstom in a number of
technical roles, gaining experience in aerodynamics, heattransfer, fluid systems, mechanical design, combustion,services and IVHM. He moved to Cranfield in July 2008 as
irector of the newly formed IVHM Centre.The Centre is funded by a number of industrial companies,
-Royce, Thales,Meggitt, MOD and Alstom Transport. He has led thedevelopment and growth of the Centre, in research andducation, over the last three years. The Centre offers a
short course in IVHM and the world’s first IVHM MSc,
Ian is on the editorial Board for the International Journal ofCondition Monitoring, a Director of the PHM Society,
member of the SAE IVHM Steering Group and1 IVHM committee, a Fellow of IMechE, RAeS and
ASME. He is the editor of the recent SAE book: IVHM
10
, Spain,and an MSc in Aerospace Engineering from BrunelUniversity, UK. He is currently pursuing a PhD in IVHM at
is the Head of the School ofEngineering at Cranfield University inthe UK and has been the University's
neering sinceFollowing his PhD at
Imperial College, London he spent 18years in industry, holding a wide range of
systems engineering and management roles, including Headof Systems Engineering for a major multinational company.His experience and responsibilities in industry encompassed
, includingRequirements Engineering, System Design, ILS, ARM,Human Factors, Safety, Systems Proving & Simulation andModelling. He is a member of several National AdvisoryCommittees and Industrial Steering Boards and served as
national Council on SystemsEngineering (INCOSE) in the UK from 2003 to 2004. Hisresearch interests include: Understanding Complex Systemsand Systems of Systems (SoS); Managing ComplexSystems Projects and Risks; Through Life Capability
oping with Uncertainty and Change in
career spans over 30years, working mostly for a variety of gasturbine companies. He has a MechanicalEngineering degree and a PhD in CFD bothfrom Imperial College, London. He has
Royce (twice), GeneralElectric and Alstom in a number of
technical roles, gaining experience in aerodynamics, heattransfer, fluid systems, mechanical design, combustion,services and IVHM. He moved to Cranfield in July 2008 as
irector of the newly formed IVHM Centre.The Centre is funded by a number of industrial companies,
Royce, Thales,Meggitt, MOD and Alstom Transport. He has led thedevelopment and growth of the Centre, in research andducation, over the last three years. The Centre offers a
short course in IVHM and the world’s first IVHM MSc,
Ian is on the editorial Board for the International Journal ofCondition Monitoring, a Director of the PHM Society,
member of the SAE IVHM Steering Group and1 IVHM committee, a Fellow of IMechE, RAeS and
ASME. He is the editor of the recent SAE book: IVHM –
First European Conference of the Prognostics and Health Management Society, 2012
201
Unscented Kalman Filter with Gaussian Process Degradation Modelfor Bearing Fault Prognosis
Christoph Anger B.Sc.1, Dipl.-Ing. Robert Schrader1, and Prof. Dr.-Ing. Uwe Klingauf1
1 Institute of Flight Systems and Automatic Control, Darmstadt, 64287, [email protected]
[email protected]@fsr.tu-darmstadt.de
ABSTRACT
The degradation of rolling-element bearings is mainlystochastic due to unforeseeable influences like short termoverstraining, which hampers the prediction of the remain-ing useful lifetime. This stochastic behaviour is hardly de-scribable with parametric degradation models, as it has beendone in the past. Therefore, the two prognostic concepts pre-sented and examined in this paper introduce a nonparametricapproach by the application of a dynamic Gaussian Process(GP). The GP offers the opportunity to reproduce a damagecourse according to a set of training data and thereby also esti-mates the uncertainties of this approach by means of the GP’scovariance. The training data is generated by a stochasticdegradation model that simulates the aforementioned highlystochastic degradation of a bearing fault. For prediction andstate estimation of the feature, the trained dynamic GP iscombined with the Unscented Kalman Filter (UKF) and eval-uated in the context of a case study. Since this prognostic ap-proach has shown drawbacks during the evaluation, a multi-ple model approach based on GP-UKF is introduced and eval-uated. It is shown that this combination offers an increasedprognostic performance for bearing fault prediction.
1. INTRODUCTION
Forecasting the remaining useful lifetime (RUL) of line-replaceable units (LRUs) with a high accuracy is one of themain issues in aviation to avoid unnecessary maintenance cy-cles and, therefore, to reduce aircraft life cycle costs. Onecomponent of those LRUs can be rolling-element bearings,whose RUL is of great interest and are therefore in the centreof this enquiry.Rolling-element bearings ensure the functionality of rotatingassembly parts in case of varying loading and frequency. Dur-ing their life cycle, bearings degrade in two different ways
Dipl.-Ing. Robert Schrader et.al. This is an open-access article distributedunder the terms of the Creative Commons Attribution 3.0 United States Li-cense, which permits unrestricted use, distribution, and reproduction in anymedium, provided the original author and source are credited.
according to the amount, duration and the nature of loadingand other influences. Those are e.g. contaminants or con-structive incertitudes that can affect the tribological systemof a bearing. Overstraining and solid friction caused by ahigh cyclic stress and a lack of lubricant, respectively, leadsto a rapid degradation of the bearing, as in case of calculatedstrains like wear and tear or fatigue, the course of damage in-creases continuously. A degradation process of a real bearingresults from both kinds of strains and, therefore, has a strongstochastic character (Sturm, 1986).To simulate this behaviour, several approaches of degrada-tion models (DMs) have been developed in the past. Mostof them base on the Paris-Erdogan law that describes a rela-tion between crack growth rate and effective stresses in theexamined material. By adjusting this law to existing test re-sults of real bearings, several enhancements were formulatedand evaluated, as Choi et al. did in (Choi & Liu, 2006b)and (Choi & Liu, 2006a), respectively. Other DMs are basedon the Lundberg-Palmgren model that describes the correla-tion between the probability of survival and among others themaximum shearing stress. Yu et al. refined this approach byadding a more precise geometrical description of the contactsurface in (Yu & Harris, 2001).All aforementioned models describe the degradation to de-pend on the application of external load difference, as a non-loaded bearing would not degrade at all. In reality, the degra-dation is also a function of the current degradation, since de-tached particles can lead to solid friction. The DM in thispaper that is applied to generate reliable degradation coursesconsiders both the degradation rate due to loading and due tothe state of degradation itself.Most of these DMs are used as prognostic models (PMs) incombination with state estimation. Usually particle filtersbased on the aforementioned Paris-Erdogan model are imple-mented, as done in (Orchard & Vachtsevanos, 2009). Otherprognostic concepts are based on the Archard wear equation.Daigle et al. presented a model-based prognostic approach byestimating the RUL of a pneumatic valve with a fixed-lag par-
1
First European Conference of the Prognostics and Health Management Society, 2012
202
European Conference of the Prognostics and Health Management Society, 2012
ticle filter (Daigle & Goebel, 2010). The appropriated modelrelates the current degradation to the wear of material basedon the Archard equation.Besides a DM that accounts for the current degradation state,Orsagh et al. presented a prognosis approach of a rolling-element bearing (Orsagh, Sheldon, & Klenke, 2003). Bymeasuring several features like e.g. the oil debris of the bear-ing or the vibration signal, they predicted the RUL dependingon the measured fused features by correlation with the currentstate of degradation. The RUL was then forecast according tothe applied PM.The prognostic concept at hand attempts another approach,as it is not based on a physical model. Therefore, a dynamicGaussian Process (GP) model is trained on a degradation pro-cess and combined with the Unscented Kalman Filter (UKF)for state estimation. Ko et al. analysed this dynamic prognos-tic model in (Ko, Klein, Fox, & Haehnel, 2007) by trackingan autonomous micro-blimp. Additionally, the expected ben-efits of the GP-UKF concept in combination with a multiplemodel approach is examined.This paper is divided into four parts. In Section 2 the ap-propriated DM of a rolling-element bearing is presented. Theprognostic approach with a short introduction in the two com-ponents UKF and GP and the multiple model approach aredescribed in Section 3 and in Section 4 the two concepts aretested and evaluated in the context of a case study.
2. BEARING FAULT DEGRADATION
As the objective of this paper is to forecast the degradationof a bearing, a feature has to be identified that directly corre-sponds to the current state of degradation. One variable is thesurface of pitting A, i.e. excavation of macroscopic particlescaused by material fatigue, either in rolling-elements or in theinner- or outer-race. As one pitting does not immediately leadto the failure of a bearing, its functionality remains. However,this fault produces single impacts due to the geometrical ir-regularity to the assembly group directly contacting the bear-ing. Depending on the location of the fault, these impactsappear with certain frequencies as a function of the rotationspeed Ω of the shaft, the number of rolling elements n and ge-ometric magnitudes, summarised by Antoni et al. in (Antoni,2007) and depicted in Table 1.
Inner-race fault n2 Ω(1 + d
D cos θ)Outer-race fault n
2 Ω(1− dD cos θ)
Rolling-element fault DΩd (1− ( dD cos θ)2)
Cage fault Ω2 (1− d
D cos θ)Inner-race modulation ΩCage modulation Ω
2 (1− dD cos θ)
Table 1. Ω = speed of shaft; d = bearing roller diameter;D = pitch circle diameter; n = number of rolling elements;θ = contact angle
These impacts produce a structure-borne noise and by usingan acoustic emission sensor, the frequency and the amplitudeof the impacts generated by the rotating fault can be detected.Herewith the location of the fault (according to the frequencyand Table 1) and the degree of degradation can be determinedas the acceleration amplitude is assumed to correlate withthe pitting surface and, therefore, the current condition of thebearing.The degradation of a real rolling-element bearing results fromtwo different courses of damage, as described in the introduc-tion: a continuously rising damage caused by material fatiguethat is interrupted by abrupt steps as a result of overstraining.Both effects can be mathematically described in the appropri-ated DM by the following differential equation of the degra-dation, i.e. pitting surface A
∆Ai = kA ·Ai−1 + ku ·∆ui−1, (1)
where ∆ui is the external loading difference of the bearingduring one cycle and kA ∼ W(λ′(Ai−1), k′(Ai−1)) is a fac-tor drawn from a Weibull distribution, whose scaling param-eter k′ and shape parameter λ′ are expected to be functionsof the previous degradation Ai−1. The product kA · A rep-resents the influence of the degradation on the transition rate.Analogously, ku ·∆u stands for the increased degradation ratecaused by loading, as ku ∼ E(µ(Ai−1)) is drawn from an ex-ponential distribution, whereas the mean µ is also a functionof Ai−1. Both coefficients kA and ku realise the stochasticcharacter of the degradation. Therefore, the current degrada-tion in cycle i can be calculated as
Ai = Ai−1 + ∆Ai. (2)
0 50 100 150 200 2500
20
40
60
cycle
pitti
ngsu
rfac
eA
/µm
2
(a)
0 50 100 150 200 2500
0.5
1
cycle
load
ing
u
(b)
Figure 1. (a) three different degradation courses as the resultof Equation (1), (b) applied normalised load spectrum
2
First European Conference of the Prognostics and Health Management Society, 2012
203
European Conference of the Prognostics and Health Management Society, 2012
In Figure 1, three different damage courses generated by theDM and the applied normalised loading are depicted. Fig-ure 1a clearly shows the stochastic character of a real degra-dation, as the RUL of all courses differ strongly and themainly continuous course is interrupted by steps in case ofhigh strain. The correlation between the applied loading inFigure 1b and the degradation rate is obvious, as the load dif-ference between cycle 100 and 125 is zero and the degrada-tion in this range is quite flat. Thus, the applied DM is as-sumed to reproduce the damage course of a faulty bearing forthe use of this paper instead of real test rig measurements.
3. PROGNOSTIC APPROACH
The applied prognostic concepts are introduced in this sec-tion. The UKF is used for state estimation and prediction ofthe degradation. Instead of a parametric model, the UKF isfounded on a trained dynamic GP. The basics of both prog-nostic tools are presented in the following subsections. Sub-section 3.1 is based on (Ko et al., 2007), (Ko & Fox, 2011)and (Rasmussen & Williams, 2006). In Subsection 3.3, amultiple model approach that promises an increased prognos-tic performance is explained.
3.1. Dynamic Gaussian Process for Fault Degradation
The GP offers the feasibility of learning regression functionfrom sample data without any parametric model. Rasmussenet al. describe the GP in (Rasmussen & Williams, 2006) asdefining a Gaussian distribution over a function. In otherwords, the GP establishes a function f out of a given train-ing data set D = (x1, y1), (x2, y2), ...(xn, yn) according toa given noisy process
y = f(X) + ε, (3)
where X = [x1, x2, ..., xn] is a n × m input matrix with mthe number of inputs and n the length of the single input vec-tor xi. y is a n-dimensional vector of scalar outputs and εrepresents a noise term, which is drawn from a Gaussian dis-tribution N (0, σ2).A Gaussian distribution is basically defined by its mean µ andcovariance Σ. Therefore, the GP defines a zero-mean jointGaussian distribution over the given outputs y of the trainingdata D, as follows
p(y) = N (0; K(X,X) + σ2nI). (4)
The covariance of this joint distribution consists of the ker-nel matrix K(X,X) that represents the deviation of the in-puts among each other and the term σ2
nI for the Gaussiannoise caused by ε. The entries of K are the kernel functionsk(xi, xj), where the squared exponential
k(xi, xj) = σ2f exp(−1
2(xi − xj)W(xi − xj)T ). (5)
is a standard kernel function. Here, σ2f is the signal variance
and W is a diagonal matrix that contains the distance measure
of every input.To calculate the mean GPµ and the covariance GPΣ out of agiven test input x∗ and test output y∗ w.r.t. the training dataD, the following expression can be applied
GPµ(x∗, D) = kT∗ [K + σ2nI]−1y (6)
for the mean and
GPΣ(x∗, D) = k(x∗, x∗)− kT∗ [K + σ2nI]−1k∗ (7)
for the covariance. Here, the compact form K(X,X) = Kand k∗ the covariance function between the test input x∗ andthe training input vector X is used. Obviously, the meanprediction in Equation (6) is a linear combination of thetraining output y and the correlation between test and traininginput. The covariance is the difference of the covariancefunction w.r.t. the test inputs and the information from theobservation k(x∗, x∗).The GP possesses three so-called hyperparametersθ = [W σf σn] from the kernel function and the pro-cess noise. Optimal hyperparameters θ can be found bymaximising the log likelihood
θmax = arg maxθlog(p(y|X, θ)) . (8)
Considering a stochastic dynamic degradation process, Equa-tion (3) can be written as
rk+1 = rk + ∆rk + εk. (9)
Therefore, the state transition ∆rk is trained with the GP. Thegenerated training data set Dr = X,X ′ consists of the in-puts X = [(r1,∆u1), (r2,∆u2), ..., (rn,∆un)] and the statetransition X ′ = [∆r1,∆r2, ...,∆rn], which is calculated as
∆rk = rk − rk−1 (10)
or w.r.t. the mean of the dynamic GP of Equation (6)
rk = rk−1 +GPµ(uk−1, rk−1, Dr) (11)
and the covariance GPΣ(uk−1, rk−1, Dr), both fully de-scribe the Gaussian distribution of the GP. The additional ben-efit generated by this approach is the time invariance causedby the transition from a static to the dynamic system andthe ability to capture different kinds of degradation processeswithout physical knowledge of the actual process.
3.2. Combining GP and Unscented Kalman Filter
In case of a nonlinear dynamic system, the application of theUKF is the appropriate choice, because it estimates the stateof nonlinear systems by means of observation z and systeminputs u. As the presented prognostic approach intends toomit a physical degradation model, the Extended Kalman fil-ter is also inapplicable, since an analytic model is requireddue to the linearisation step.In general, a nonlinear dynamic system in kth time step canbe described as
xk = G(xk−1,uk−1) + εk (12)
3
First European Conference of the Prognostics and Health Management Society, 2012
204
European Conference of the Prognostics and Health Management Society, 2012
with the state transition function G, the n-dimensionalstate vector x, the input vector u and an additive Gaussiannoise term ε drawn from a zero-mean Gaussian distributionε ∼ N (0, Qk) with the process noise Qk as covariance.An analogue description of the observation zk can be formu-lated as
zk = H(xk) + δk. (13)
Here, H relates the state to the observation and δ is also anadditive noise term δ ∼ N (0, Rk), where Rk is the measurenoise.Through the scaled unscented transformation by Julier et al.(Julier, 2002) sigma points χ[i] are defined according to thecovariance Σ and the mean µ of the previous time step
χ[0] = µχ[i] = µ+ (
√(n+ λ)Σ)i for i = 1, ..., n
χ[i] = µ− (√
(n+ λ)Σ)i−nfor i = n+ 1, ..., 2n,(14)
where λ is a scaling parameter that, in case of the scaled un-scented transformation, is defined as
λ = α′2(n+ κ)− n. (15)
Here, α′ and κ are further scaling parameters to determine thespread of the sigma points. These sigma points according tothe standard UKF are transformed depending on function Gto generate a new distribution with mean and covariance.As the applied UKF contains the dynamic GP, this state tran-sition functionG is replaced by the Gaussian predictive distri-bution of Equation (6) and thereby defines a new set of sigmapoints
χ[i]k = GPµ(χ[i], D). (16)
Similarly, the process noise QK is defined by Equation (7).With this information, a priori mean and covariance can begenerated by
µ =
2n∑
i=0
w[i]m χ
[i]k
Σ =
2n∑
i=0
w[i]c (χ
[i]k − µ′)(χ
[i]k − µ′)T+
+ GPΣ(x∗, D) (17)
with weights wm and wc set up in (Julier, 2002).The whole applied GP-UKF algorithm is depicted in Table 2.In comparison to Equation (16), the new sigma points in line3 are generated by χ[i]
k = χk−1 +GPµ(uk−1, χ[i]k−1, DG), as
the applied GP is trained according to Equation (9). The pre-diction of the mean and covariance in time step k describedin Equation (14) to (17) takes places from line 1 to 5. A prioriestimation is corrected according to the measured observationzk from line 7 to 13. This correction step proceeds similarlyto the prediction.In line 6 the transformed sigma points of line 3 χk are usedas observation Z [i]
k . Line 8 is comparable to line 5 and in line
1: Inputs µk−1, Σk−1, uk−1, zk, Rk2: χk−1 = (µk−1, µk−1 + γ
√Σk−1, µk−1 − γ
√Σk−1)
3: for i = 0, ..., 2n :
χ[i]k = χk−1 +GPµ(uk−1, χ
[i]k−1, Dg)
Qk = GPΣ(uk−1, µk−1, Dg)
4: µk =∑2ni=0 w
[i]m χ
[i]k
5: Σk =∑2ni=0 w
[i]c (χ
[i]k − µk−1)(χ
[i]k − µk)T +Qk
6: Z [i]k = χ
[i]k
7: zk =∑2ni=0 w
[i]m Z [i]
k
8: Sk =∑2ni=0 w
[i]c (Z [i]
k − zk)(Z [i]k − zk)T +Rk
9: Σx,zk =∑2ni=0 w
[i]c (χ
[i]k − µk)(Z [i]
k − zk)T
10: Kk = Σx,zk S−1k
11: µk = µk +Kk(zk − zk)
12: Σk = Σk −KkSkKTk
13: Outputs µk,Σk
Table 2. Applied GP-UKF Algorithm
9 the cross-covariance of prediction and observation is deter-mined. Depending on both, the Kalman gain Kk is generatedin line 10 and based on this, the new mean and covariance intime step k are defined in line 11 and 12, respectively.
3.3. Multiple Model Approach
Selecting one model to predict the RUL of bearing faults ig-nores the uncertainty due to the stochastic nature of the degra-dation process. To take the uncertainties into account, moreprognostic models (PMs) are needed to improve the predic-tion. A Bayesian formalism is used to combine the knowl-edge of a setM of PMs, by weighting each model to be thecorrect one, as demanded by Li et al. in (Li & Jilkov, 2003).Therefore, the Interacting Multiple Model (IMM) estimator,which bases on the Autonomous Multiple Model (AMM), isapplied. In contrary to latter, the IMM belongs to the groupof cooperating multiple model approaches, since every modelmi ∈ M interacts with the other. Thus, the multiple modelfilters are reinitialised during every time step k according toinformation of the previous time step.Consider Equations (12) and (13) with one PM. Then the ex-tension to the multiple model approach follows as
xk = G(xk−1,uk−1,mi) + εk
zk = H(xk,mi) + δk (18)
according to (Schaab, 2011).The first steps of the IMM algorithm consist of a reinitialis-ing step with a calculation of a mode probability of every ithmodel
µ(i)k|k−1 = P (m
(i)k |y1:k) for i = 1, ..., nz
=
nz∑
j=1
hijµ(j)k−1 (19)
with the entries hij = Pmk = mj |mk−1 = mi
of the
transition matrix H according to Markov. The application of
4
First European Conference of the Prognostics and Health Management Society, 2012
205
European Conference of the Prognostics and Health Management Society, 2012
the transition matrix H prevents the prognostic approach ofinsisting on one model, as it offers the possibility of a changefrom model i to j during every time step. Therefore, the tran-sition matrix H describes a Markov chain, whereas H is as-sumed to be time invariant.By using the information of the previous time step andµ
(i)k|k−1, a weighting factor according to
µj|ik−1 = P (m
(i)k−1|y1:k−1,m
(i)k )
=hj|iµ
(j)k−1
µ(i)k|k−1
(20)
is calculated. Herewith an individual reinitialising value forevery filter
x(i)k−1|k−1 = E[xk−1|y1:k−1,m
(i)k ]
=
nz∑
j=1
x(j)k−1|k−1µ
j|ik−1 (21)
and similarly a covariance P (i)k−1|k−1 (s. Table 3) is computed.
After the reinitialising of the models, these initial values areprovided to the applied filters, which are in case of the appro-priated prognostic approach GP-UKFs with different PMs.According to the likelihood L
(i)k , which depends on the
residuum e(i)k = z
(i)k − zk
(i) and indicates the probabilitythat i is the correct model, the state probability of model i iscalculated as
µ(i)k =
µ(i)k|k−1L
(i)k
∑nz
j=1 µ(j)k|k−1L
(j)k
. (22)
Finally, the results of the single i filters are fused to the statexk|k and covariance estimation Pk|k by means of the mini-mum mean squared error (MMSE) weighted with the stateprobability of Equation (22)
xk|k =
nz∑
i=1
x(i)k|kµ
ik
Pk|k =
nz∑
i=1
[P(i)k|k + (xk|k − x(i)
k|k)(xk|k − x(i)k|k)T ]µ
(i)k .(23)
The entire algorithm is depicted in Table 3.
4. CASE STUDY
The previously defined prognostic concepts are tested andevaluated for the case of a degrading rolling-element bear-ing according to the DM of Section 2. Since the training dataof the GP are computer-generated, the appropriated vibrationmodel (VM) is defined and described. Afterwards, the resultsand problems of the GP-UKF approach and the IMM prog-nostic approach are presented.
1: Inputs µ(i)k−1, xk−1|k−1, Pk−1|k−1
2: µ(i)
k|k−1 =∑nzj=1 hijµ
(j)k−1, for i = 1, ..., nz
3: µj|ik−1 =
hj|iµ(j)k−1
µ(i)k|k−1
4: x(i)
k−1|k−1 =∑nzj=1 x
(j)
k−1|k−1µj|ik−1
5: P(i)
k−1|k−1 =∑nzj=1[P
(j)
k−1|k−1 + (x(i)
k−1|k−1 − x(j)
k−1|k−1)
(x(i)
k−1|k−1 − x(j)
k−1|k−1)T ]µj|ik−1
6: s. Table 2, Inputs: x(i)
k−1|k−1, P(i)
k−1|k−1, yk, R(i)
Outputs: e(i)k = z
(i)k − zk(i), x
(i)
k|k, P(i)
k|k7: L
(i)k = P (e
(i)k |m
(i)k , y1:k) = N (e
(i)k ; 0, S
(i)k )
8: µ(i)k =
µ(i)k|k−1
L(i)k
∑nzj=1 µ
(j)k|k−1
L(j)k
9: xk|k =∑nzi=1 x
(i)
k|kµik
10: Pk|k =∑nzi=1[P
(i)
k|k + (xk|k − x(i)
k|k)(xk|k − x(i)
k|k)T ]µ(i)k
11: Outputs µ(i)k , xk|k, Pk|k
Table 3. IMM Algorithm
4.1. Simulation of Structure-borne Noise
The aim of this subsection is to generate a vibration signal ofa faulty bearing, as it could be measured in reality. There-fore, a combination of a later on set up VM and a DM isrequired, since the latter creates a monotonically rising ac-celeration amplitude of impulses, as described in Section 2,which are modulated by the VM. By this means, a vibrationsignal is generated and can be evaluated in frequency rangeto detect the state of degradation.The impulses in the area of the bearing can be measured byan acoustic emission sensor. Antoni et al. determined in(Antoni, 2007) that this measured vibration signal consists ofseveral modulations of the initial impulses and can be sum-marised in a VM in time domain as
x(t) =
+∞∑
i=−∞h(t− iT − τi) q(iT )Ai + n(t). (24)
Here, τi and Ai represent the uncertainties of the measuredsignal in the arrival time and the amplitude of the ith impact,respectively, as e.g. the penetration of the rolling-element intothe pitting of an inner-race is a stochastic process. The trans-mission behaviour of the surrounding machine parts up tothe acoustic emission sensor is considered by the impulse re-sponse function h(t) with the inter-arrival time iT of two con-secutive impulses. The amplitude modulation q(t) is causedby the cage frequency and n(t) represents the additive back-ground noise.The applied VM is based on Equation (24). For a more re-alistic signal, it is assumed that there is another additive im-pulse sequence caused by other mechanisms in the LRU. Forexample a rotating systems with a rotating-frequency is alsomeasured by the acoustic emission sensor and its amplitudesoutclass those of the fault. The resulting vibration signal x(t)in time range and its power spectral density (PSD) is depicted
5
First European Conference of the Prognostics and Health Management Society, 2012
206
European Conference of the Prognostics and Health Management Society, 2012
0 4 · 10−2 8 · 10−2 0.14
−1000
100
time t/s
x/10−
2m/s
2
(a)
102 102.5 103 103.50
5
10
frequency f/Hz
PSD
ofx
(b)
102 102.5 103 103.50
5
10
frequency f/Hz
PSD
ofxenv
(c)
Figure 2. (a) vibration signal x(t) in time range, (b) PSD ofx(t) with the marked fault frequency ff = 127Hz, (c) PSDof the envelope of x(t)
in Figures 2a and 2b, respectively. In Figure 2b, there is amark at the fault frequency ff = 127Hz, as a fault was as-sumed to be localised at the inner-race.The vibration signal in time range is dominated by the back-ground noise and the impact sequence of other mechanismswith a frequency of fo = 20Hz. In Figure 2b the as-sumed system behaviour of a second-order lag element withan eigenfrequency of fSB ≈ 1600Hz representing the pathbetween the bearing and the sensor is clearly visible in con-trast to the impulses caused by the fault, which are almostovershadowed by the background noise.By the application of an envelope xenv of the original vibra-tion signal, the influence of the system behaviour is reduced,as depicted in Figure 2c. Besides the impulse sequence and itsmodes, the amplitude of the fault frequency can be scanned.The amplitude of the PSD at the fault frequency is relatedto the amplitude given by the DM and therefore is an appro-priate feature for the prognostic process, as it determines thecurrent state of degradation. In addition the sidebands causedby the cage modulation q(t) at the frequency f = ff ± fcwith an expected cage frequency fc = 20Hz get visible.
0 50 100 150 2000
20
40
60
cycle
pitti
ngsu
rfac
eA
/µm
2
(a)
0 50 100 150 2000
10
20
cycle
feat
ure
(b)
Figure 3. simulated degradation of a rolling-element bearing(a) real degradation as the result of Equation (1), (b) measurednormed degradation of sampling the PSD generated by thepreviously set up VM at the fault frequency
The comparison of the real degradation of the applied DMin Equation (1) and the feature rfeat is depicted in Figure 3,whereas the applied loading is given in Figure 1b. The mea-sured degradation is quite noisy due to the background noisethat dominates the PSD of the vibration signal in lower fre-quency range. It indicates a different course compared tothe real degradation due to the frequency analysis, but as italso denotes the monotonically rising character, the measuredamplitude directly correlates with the pitting surface in Fig-ure 3a, i.e. the current degradation.The prognostic range is set within the normed feature bound-ary rfeat ∈ [0.001, 1], which is related to a pitting surfacerange of A ∈ [5µm2, 70µm2]. So this measured degradationcourse is applied for training and testing the GP-UKF prog-nostic concepts.
4.2. Applied Performance Metrics
To analyse the prognostic performance, several performancemetrics have to be applied. Those metrics can be one singleanalytical characteristic for the entire prediction or a graphi-cal depiction of every prediction step. Selected performancemetrics are summarised by Saxena et al. in (Saxena et al.,2008), whereas a few of these are used for evaluation of theappropriated prognostic concept in the following sections.Some notations of the metrics domain are given in the follow-ing glossary:
UUT Unit under testEOL End of life
6
First European Conference of the Prognostics and Health Management Society, 2012
207
European Conference of the Prognostics and Health Management Society, 2012
EOP End of prediction - predicted failure featurecrossed threshold
i Time indexl Number of UUT indexP Time index of the first predictionL Total number of predictionsλ Normed time range of the entire predictionrl(i) Estimation of RUL at time step ti for the lth
UUTrl∗(i) Real RUL at time step ti
In the following subsections the applied performance metricsare defined.
4.2.1 Error
The error ∆l(i) indicates the difference between the predictedRUL and the true RUL in time step i
∆l(i) = rl∗(i)− rl(i) (25)
The error is one of the basic accuracy indicators and is, there-fore, included directly or indirectly in most of the selectedmetrics.
4.2.2 Average Bias
By averaging the error w.r.t. the entire prediction range, theaverage bias AB of lth UUT is defined as
ABl =
∑EOPi=P ∆l(i)
EOPl − Pl + 1(26)
Thus, the perfect score of ABl is zero.
4.2.3 Mean absolute percentage error
The Mean Absolute Percentage Error (MAPE) can be writtenas
MAPE(i) =1
L
L∑
i=1
∣∣∣∣100∆l(i)
rl∗(i)
∣∣∣∣ . (27)
As it contains the error w.r.t. the actual RUL, derivations inearly states of prediction are not as weighted as those near theEOP.
4.2.4 Mean squared error
One most commonly used metric is the Mean Squared ErrorMSE, since it averages the squared error w.r.t. the numberof predictions L
MSE(i) =1
L
L∑
i=1
∆l(i)2 (28)
An advantage in comparison to the average bias is that theMSE considers both negative and positive errors, as theaverage bias decreases at the appearance of positive andnegative derivations within one prediction.
4.2.5 Prognostic horizon
The Prognostic horizon PH describes the difference betweenthe EOP and the current time step i
PH(i) = EOP − i, (29)
whereas the PH can be dictated to fulfill certain specifica-tions. Those are e.g. to remain within a given constant errorbound depending on an accuracy value α, i.e.
[1− α] · rl∗ ≤ rl(t) ≤ [1 + α] · rl∗, (30)
comparable to the metrics in the following last subsection.Throughout the whole expectations the accuracy value is α =0.05.
4.2.6 α - λ Performance
Similarly to the PH, the α - λ Performance describes the timespan, when the predicted RUL remains within a given errorbound. In comparison to the PH, the bound decreases linearlyaccording to
[1− α] · rl∗(t) ≤ rl(t) ≤ [1 + α] · rl∗(t). (31)
Like the MAPE of Equation (27), this metric favours earlypredictions at λ ≈ 0 and tightens the demands for predictionsnear EOP (λ ≈ 1). Throughout the whole expectations theaccuracy value α of the α - λ accuracy is α = 0.20.
4.3. Prognostic Results of GP-UKF Approach
The aim of this section is to evaluate the GP-UKF approach.In Figure 4, four different test trials are depicted, whereas allcourses were generated by the DM in Equation (1). How-ever, the difference especially between trial 1 and 4 in termsof RUL at the beginning of the observation is obvious. There-fore, trial 2 has a similar course to trial 3 with nearly the samelife cycle.In Figure 4b, the corresponding features scanned at the faultfrequency ff = 127Hz in the PSD are given, which show aslightly different character in comparison to the real degrada-tion. The noise increases at higher degradation and is filteredby a low-pass filter with regard to the GP training. In Figure 5,the results of the GP training using trial 3 is shown, whereasthe estimated training and the real degradation is overlaid bythe estimated test degradation.The results of both the state estimation and the prediction ofdata set 2 and 3 using the UKF are presented in Figure 6.To compare the state estimation performance, the real featurecourse is also plotted. The first prediction started at cycle 2,whereas the following predictions began every 10 cycles later.The state estimation of data set 2 matches the real course rec-ognizably. But also the predictions show a high accuracy withan initial error ∆(2)(i = 2) ≈ 2 cycles. In sum, all predic-tions represent the behaviour of the damage course of trial 2.When the GP-UKF is trained and tested with data set 3, theprediction performance becomes slightly worse, as the error
7
First European Conference of the Prognostics and Health Management Society, 2012
208
European Conference of the Prognostics and Health Management Society, 2012
0 100 200 3000
20
40
60
cycle
pitti
ngsu
rfac
eA
/µm
2
trial 1 trial 2 trial 3 trial 4
(a)
0 100 200 3000
10
20
cycle
feat
ure
trial 1 trial 2 trial 3 trial 4
(b)
Figure 4. four applied data sets, (a) real degradation, (b) fea-ture
of the forecast RUL of early predictions is about 20 cycleslower than the real RUL. However, later predictions (≈ 6th)match the real degradation with an accuracy allowing the pre-diction of the RUL.The same aforementioned behaviour is depicted in severalperformance metrics, summarised in Table 4. The two simi-lar metrics PH and α-λ accuracy are given as fractions of thenormed prediction range λ to indicate the time range, whenthe predictions fulfill the specifications until the EOP. Addi-tionally, the RUL is also normed to allow comparison of thefour trials with different RUL.In general, the predictions of all four trials show a high ac-
0 5 10 15 200
0.1
0.2
0.3
degradation r
deriv
atio
n∆r
real r est r tr est r test
Figure 5. training of the GP with data set 3
0 50 100 150 200 2500
0.5
1
cycle
feat
ure
state estimation real feature predictions
(a)
0 50 100 150 2000
0.5
1
cycle
feat
ure
state estimation real feature predictions
(b)
Figure 6. the state estimation and prediction of (a) data set 2and (b) data set 3 in comparison to the real degradation course
Performance AB MSE MAPE α - λ PHMetric
Data Set 1 −0.21 21.79 19.52 1 0.86Data Set 2 0 10.87 13.11 1 1Data Set 3 −3.48 78.71 24.55 1 0.81Data Set 4 −2.28 56.44 23.86 1 0.84
Table 4. performance metrics of the four trials tested withthemselves
curacy, since every trial remains within the given α-λ errorbound during the entire prediction range and also satisfies thetighter specification of the PH after 20% of the normed pre-diction range λ, as displayed in Figure 7. Therefore, everytrial converges to the actual RUL with only slight deviationsat 0.3 λ. Additionally, all predictions indicate a rather conser-vative character, since the AB of all trials is mainly negative.In sum, the selected metrics correspond with the graphicalresults of Figure 6. The appropriated GP-UKF prognosticconcept offers a high accuracy for long turn prediction of arolling-element bearing, in case of the degradation followingthe model the filter is trained with.
4.4. Generalisation of the Prognostic Approach
Now the prognostic results of a GP-UKF that is tested witha degradation course, which differs from the training set, arediscussed as it occurs in real applications. Figure 8a and 8b
8
First European Conference of the Prognostics and Health Management Society, 2012
209
European Conference of the Prognostics and Health Management Society, 2012
0 0.2 0.4 0.6 0.8 10
0.5
1
λ
RU
L
accuracy span pred. RUL Tr/Test 1pred. RUL Tr/Test 2 pred. RUL Tr/Test 3pred. RUL Tr/Test 4 real RUL
(a)
0 0.2 0.4 0.6 0.8 10
0.5
1
λ
RU
L
accuracy span pred. RUL Tr/Test 1pred. RUL Tr/Test 2 pred. RUL Tr/Test 3pred. RUL Tr/Test 4 real RUL
(b)
Figure 7. (a) prognostic horizon of the four given trials atα = 5%, (b) α-λ accuracy at α = 20%
show the results of the state estimation and prediction of trial1 and 4, respectively, when the prognostic model is trainedwith the data of trial 2. Additionally, the real degradationis plotted. The state estimation of both sets is satisfactory,since there are only slight deviations over the whole prognos-tic range. The predictions generally indicate the course ofthe training data set 2 with a progressive degradation at thebeginning and a flat degradation rate at the end of the life cy-cle, whereas both characteristics differ from the tested sets.Therefore, the forecast degradations are not as convincing asin comparison to Section 4.3, according to expectations.
Performance AB MSE MAPE α - λ PHMetric
Data Tr 2 Test 1 44.64 3522.93 242.65 0 0.07Data Tr 2 Test 4 8.64 544.8 116.60 0 0.04Data Tr 3 Test 1 22.79 1514.79 151.34 0 0.071Data Tr 3 Test 4 −28.12 1172.28 149.59 0 0.28
Table 5. performance metrics of different test data sets, whenGP is trained with data set 2
The performance is analysed again by means of the metricsin Table 5. Additionally, the prognosis performance in caseof training data set 3 is depicted. Compared to the results of
0 50 100 150 2000
0.5
1
cycle
feat
ure
state estimation real feature predictions
(a)
0 100 200 3000
0.5
1
cycle
feat
ure
state estimation real feature predictions
(b)
Figure 8. the state estimation and prediction of (a) data set 1and (b) data set 4 with the GP trained with data set 2
the previous section, all metrics increased considerably dueto both aforementioned causes of different degradation be-haviour of test and training sets and slightly inaccurate stateestimation. Hereby the predictions from cycle 60 to 90 (cor-responds to λ ≈ 0.5) of test data set 1 and at the end of dataset 4 are not able to predict the real damage course. These de-viations are also depicted in the graphical metrics in Figure 9,whereas neither the specifications of α-λ accuracy nor of thePH are fulfilled satisfactory.In comparison to training data set 3, the predictions with a dy-namic GP trained with data set 2 show beneficial prognosticresults in case of test data set 4 according to Table 5, whereasw.r.t. test data set 1 the third data set is advantageous. There-fore, a combination of both training data sets through a mul-tiple model approach is assumed to exhibit benefits in com-parison to the GP-UKF with only one set of training data.
4.5. Improvements by means of Multiple Model Approach
The prediction performance of an IMM approach with twodifferent GP-UKFs is discussed in this section. The two mod-els are the GP-UKF trained with trial 2 and 3, respectively, asthe combination of both is supposed to indicate the benefitsof the IMM approach.In Figure 10, the state estimation and prediction results oftest set 1 and 4 are depicted. The real degradation of bothtest cases is estimated very accurately with only a slight devi-
9
First European Conference of the Prognostics and Health Management Society, 2012
210
European Conference of the Prognostics and Health Management Society, 2012
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
λ
RU
L
accuracy span pred. RUL Tr/Test 2/1pred. RUL Tr/Test 2/4 pred. RUL Tr/Test 3/1pred. RUL Tr/Test 3/4 real RUL
(a)
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
λ
RU
L
accuracy span pred. RUL Tr/Test 2/1pred. RUL Tr/Test 2/4 pred. RUL Tr/Test 3/1pred. RUL Tr/Test 3/4 real RUL
(b)
Figure 9. (a) prognostic horizon of the two tested data sets 3and 4 with training data set 2 (α = 5%), (b) α-λ accuracy atα = 20%
ation in Figure 10a between cycles 50 and 60. At first sight,the prediction results show a similar performance comparableto the GP-UKFs of Section 4.4. Especially the first predic-tions determine the RUL rather inaccurately, since the error|∆(i = 2)| amounts in both cases about 60 cycles. As de-scribed in the previous section, both test sets differ from theapplied training sets, which causes poor reproduction of thedamage course. The mode probability of trial 1 in Figure 10bindicates the same reason, since especially at the beginningof the prediction neither of both training sets replicate the realdegradation to satisfaction and therefore the model probabil-ities µ(1)
k and µ(2)k are about 0.5. In comparison to Figure 8a,
the later predictions of the MM approach in Figure 10a indi-cate a more accurate RUL estimation due to the dominationof training set 3 that reproduces the test set 1 more precisely.In Table 6, the performance metrics of the IMM approach areshown. Due to the inaccurate forecasts at the beginning of theprognosis, the metrics values are comparable to Table 5.To identify the advantages of the IMM approach, the net dia-grams in Figure 12 give an overview of the collected results.They show the metrics normalised to the major value withina test trial. Since most metrics describe an inaccurate RUL
0 50 100 150 200 2500
0.5
1
1.5
cycle
feat
ure
state estimation real feature predictions
(a)
0 20 40 60 80 100 120 140
0
0.5
1
cycle
mod
epr
obab
ility
Tr 2 Tr 3
(b)
Figure 10. prediction results of training sets 2 and 3 testedwith (a) trial 1 and (b) mode probability of training sets 2 and3 during prediction of trial 1
Performance AB MSE MAPE α - λ PHMetric
Data Tr 2,3 Test 1 25 1841.14 160.54 0 0.07Data Tr 2,3 Test 4 −7 895.32 132.99 0.04 0.12
Table 6. performance metrics of IMM with training set 2 and3 testing set 1 and 4
estimation with large values, the time range, when the pre-dictions fulfill the specifications of the α-λ and the PH errorbound, is exchanged for the time range those specificationsare not met, i.e. PHnet = 1− PH .The diagrams show the advantage of the IMM approach, as allmeasured performance metrics lie between the metrics of theGP-UKFs trained with one data set. That means this approachincreases the robustness of a bearing’s RUL prediction, as itprovides the possibility of discarding inaccurate models de-pending on the mode probability.However, since the MM approach consists of the two PMs,which both differ from the test sets, the prognostic perfor-mance is still hardly able to outperform the results of bothGP-UKFs with reference to Table 5 and Table 6. Especiallythe α - λ accuracy and PH of trial 1 (Figure 12a) of bothGP-UKFs indicate the same behaviour and, thus, an IMMapproach based on those models is not able to increase thisperformance.The great benefit of the increased robustness is assumed to
10
First European Conference of the Prognostics and Health Management Society, 2012
211
European Conference of the Prognostics and Health Management Society, 2012
0 50 100 150 200 250 3000
0.5
1
1.5
cycle
feat
ure
state estimation real feature predictions
Figure 11. prediction results of training trials 2 and 3 testedwith trial 4
raise by including more models with different degradationcourses. Especially a progressive degradation rate at the be-ginning of the prediction range in case of forecasting test set1 is expected to be beneficial concerning the prognosis per-formance.
(a)
(b)
Figure 12. comparison of the single performance metrics of(a) test data 1 and (b) test data 4
5. CONCLUSION
Two prognostic concepts based on the GP-UKF approach topredict the RUL of a rolling-element bearing were examinedin the context of a case study. The results showed that a dy-namic GP in combination with an UKF estimates the RUL
of a bearing very accurately, when the applied training data isequal to the trial data. If the training data differs from the trialdata, the GP-UKF is not able to forecast the degradation pre-cisely, but mainly insists on the characteristics of the traineddamage course. To solve this problem, an IMM approachbased on two different GP-UKF models has been evaluated. Itwas assumed that the IMM algorithm, restoring several prog-nostic models, is more likely to forecast a damage course ofan unknown trial. The results proved these expectations, sincethe robustness of the predictions was highly increased by theapproach.By incorporating more prognostic models into the IMM ap-proach, which should mainly differ from the applied GP-UKFs, this approach is expected to even outperform the prog-nostic results of a single GP-UKF. This will be in the focusof further research.
NOMENCLATURE
Symbols
A Pitting surface∆Ai Increase of surface during cycle i∆ui Loading difference during cycle ikA, ku Coefficients of applied degradation modelλ′, k′ Shape/scaling parameter of Weibull
distributionµ′ Expected value of Exponential distributionD Data setxn Inputs of GPX Input matrixyn Outputs of GPy Output matrixε Noise termµ Mean of modelΣ Covariance of modelK Kernel matrixσn Noise termk(xi, xj) Kernel functionσf Signal variance of Kernel functionW Distance measure weighting matrixGPµ Mean of GPGPΣ Covariance of GPx∗ Test inputθ Hyperparameters of GPrk Degradation at kth time step∆rk Degradation rate at kth time stepX ′ Degradation rate matrixDr Training data setG Transition functionQk Process noiseH Observation functionzk Observation at kth time stepRk Measure noise
11
First European Conference of the Prognostics and Health Management Society, 2012
212
European Conference of the Prognostics and Health Management Society, 2012
δk Noise Termχ[i] Sigma pointsα′, κ Scaling parameters of UKFµ, Σ A priori mean/covariancewm, wc Weights of mean/covarianceZ [i]k Observation
Kk Kalman gainM Prognostic model setmi Prognostic model iµ
(i)k|k−1 Mode probabilityhij Entries of transition matrix Hx
(i)k−1|k−1 Reinitialising state of IMM
P(i)k−1|k−1 Reinitialising covariance of IMM
L(i)k Likelihood of model i
e(i)k Residuumµ
(i)k State probability of model ix(t) Total vibration signalh(t) Impulse responseτi, Ai Uncertainties of arriving impulse responseq(t) Amplitude modulationn(t) Background noiseff Fault frequencyfo Frequency of other mechanismsfSB Eigenfrequency of system behaviourxenv Envelope of x(t)fc Cage frequencyrfeat Degradation of feature∆l(i) Error of RUL predictionABl Average Bias of lth UUTMAPE(i) Mean absolute percentage errorMSE(i) Mean squared errorPH(i) Prognostic horizonα Accuracy value
Shortcuts
RUL Remaining Useful LifetimeLRU Line-replaceable UnitDM Degradation ModelPM Prognostic ModelGP Gaussian ProcessUKF Unscented Kalman FilterIMM Interacting Multiple ModelVM Vibration ModelPSD Power Spectral DensityTr Training
(See also Glossary in Section 4.2)
REFERENCES
Antoni, J. (2007). Cyclic spectral analysis of rolling-elementbearing signals: Facts and fictions. Journal of Soundand vibration, 304(3-5), 497–529.
Choi, Y., & Liu, C. R. (2006a). Rolling contact fatigue lifeof finish hard machined surfaces - Part 1. Model devel-opment. Wear, 261(5-6), 485–491.
Choi, Y., & Liu, C. R. (2006b). Rolling contact fatigue lifeof finish hard machined surfaces - Part 2. Experimentalverification. Wear, 261(5-6), 492–499.
Daigle, M., & Goebel, K. (2010). Model-Based Prognosticsunder Limited Sensing.
Julier, S. (2002). The scaled unscented transformation. InAmerican Control Conference, 2002. Proceedings ofthe 2002 (Vol. 6, pp. 4555–4559).
Ko, J., & Fox, D. (2011). Learning GP-BayesFilters viaGaussian process latent variable models. AutonomousRobots, 30(1), 3–23.
Ko, J., Klein, D., Fox, D., & Haehnel, D. (2007). GP-UKF: Unscented Kalman filters with Gaussian pro-cess prediction and observation models. In IntelligentRobots and Systems, 2007. IROS 2007. IEEE/RSJ In-ternational Conference on (pp. 1901–1907).
Li, X., & Jilkov, V. (2003). A survey of maneuvering targettracking—Part V: Multiple-model methods. In Proc.SPIE Conf. on Signal and Data Processing of SmallTargets (pp. 559–581).
Orchard, M. E., & Vachtsevanos, G. J. (2009). A particle-filtering approach for on-line fault diagnosis and failureprognosis. Transactions of the Institute of Measure-ment and Control, 31(3-4), 221–246.
Orsagh, R., Sheldon, J., & Klenke, C. (2003). Prognos-tics/diagnostics for gas turbine engine bearings. In Pro-ceedings of IEEE Aerospace Conference.
Rasmussen, C. E., & Williams, C. K. I. (2006). GaussianProcesses for Machine Learning.
Saxena, A., Celaya, J., Balaban, E., Goebel, K., Saha, B.,Saha, S., et al. (2008). Metrics for evaluating perfor-mance of prognostic techniques. In Prognostics andHealth Management, 2008. PHM 2008. InternationalConference on (pp. 1–17).
Schaab, J. (2011). Trusted health assessment of dynamicsystems based on hybrid joint estimation (Als Ms. gedr.ed.). Dusseldorf: VDI-Verl.
Sturm, A. (1986). Walzlagerdiagnose an Maschinen undAnlagen. Koln: TUV Rheinland.
Yu, W. K., & Harris, T. A. (2001). A New Stress-Based Fa-tigue Life Model for Ball Bearings. Tribology Trans-actions, 44(1), 11–18.
12
First European Conference of the Prognostics and Health Management Society, 2012
213
Using structural decomposition methods to design gray-box modelsfor fault diagnosis of complex industrial systems: a beet sugar
factory case studyBelarmino Pulido1, Jesus Maria Zamarreno2, Alejandro Merino3, and Anibal Bregon 4
1,4 Dept. de Informatica, University of Valladolid, Valladolid, [email protected], [email protected]
2 Depto. Ingenierıa de Sistemas y Automatica, University of Valladolid, Valladolid, [email protected]
3 Depto. Ingenierıa Electromecanica, University of Burgos, Burgos, [email protected]
ABSTRACT
Reliable and timely fault detection and isolation are neces-sary tasks to guarantee continuous performance in complexindustrial systems, avoiding failure propagation in the systemand helping to minimize downtime. Model-based diagnosisfulfils those requirements, and has the additional advantageof using reusable models. However, reusing existing complexnon-linear models for diagnosis in large industrial systems isnot straightforward. Most of the times the models have beencreated for other purposes different from diagnosis, and manytimes the required analytical redundancy is small. In thiswork we propose to use Possible Conflicts, which is a modeldecomposition technique, to provide the structure (equations,inputs, outputs, and state variables) of minimal models ableto perform fault detection and isolation. Such structural in-formation can be used to design a gray box model by meansof state space neural networks. We demonstrate the feasibil-ity of the approach in an evaporator for a beet sugar factoryusing real data.
1. INTRODUCTION
Prognostics and Health Management are very important tasksfor continuous operation and to comply with safety require-ments of large industrial systems. In such systems, moni-toring and early fault diagnostics are also fundamental tasksto avoid fault effects propagation, to prevent failures, and tominimize downtime. Hence, reliable and fast fault detectionand isolation are needed, providing additionally an accurate
Belarmino Pulido et.al. This is an open-access article distributed under theterms of the Creative Commons Attribution 3.0 United States License, whichpermits unrestricted use, distribution, and reproduction in any medium, pro-vided the original author and source are credited.
input for the prognostic stage.
Model-based reasoning provides different kinds of methodsto fulfil those requirements. Model-based diagnosis uses amodel of the system to estimate the proper behaviour and tocompare with current observations in order to detect anoma-lies. In the last three decades model-based diagnosis has beenapproached by two different communities DX (Hamscher,Console, & Kleer (Eds.), 1992) –using Artificial Intelli-gence techniques–, and FDI (Gertler, 1998; Blanke, Kinnaert,Lunze, & Staroswiecki, 2006; Patton, Frank, & Clark, 2000)–based on Systems Theory and Control. Both communitiesprovide different but complementary techniques, as demon-strated by recent works (Cordier, Dague, Levy, Montmain, &Trave-Massuyes, 2004).
Our proposal elaborates on the similarities of both approachesand focus on consistency-based diagnosis using numeri-cal models (Pulido, Alonso-Gonzalez, & Acebes, 2001).Consistency-based diagnosis proceeds in three stages: first,fault detection is performed by detecting minimal conflictsin the system (minimal set of equations or components in-volved in predicting a discrepancy); second, fault isolation isachieved by computing the minimal hitting-sets of the con-flicts; third, fault identification requires using fault models topredict the faulty behaviour (Reiter, 1987; Dressler & Struss,1996), and rejecting those fault modes that are not consis-tent with current observations. In this work, we use Possi-ble Conflicts (Pulido & Alonso-Gonzalez, 2004), PCs forshort, that are computed off-line and are the complete set ofminimal redundant models that can become conflicts. PCsprovide the structural model– equations, input, output, andstate variables– that can be used for fault detection and iso-lation, or can be also used to simplify the fault prognostics
1
First European Conference of the Prognostics and Health Management Society, 2012
214
European Conference of the Prognostics and Health Management Society, 2012
stage (Daigle, Bregon, & Roychoudhury, 2011).
While using PCs we need to build off-line simulation or state-observer models (Pulido, Bregon, & Alonso-Gonzalez, 2010)to track the subsystem behaviour. This step requires theanalysis of the model, and sometimes to rewrite the originalequations for diagnosis purposes. Main advantage of model-based diagnosis is reusing existing models, but this is alsoits main difficulty. Frequently the models were created forpurposes different from diagnosis, and the required analyticalredundancy in the system is small, due to the price of addi-tional sensors, and because they allocation is related to pro-cess control. Both problems exist in large industrial systemswhere complexity comes from the highly non-linear mod-els required to mimic system performance. Consequently,reusing existing non-linear models for diagnosis in those sys-tems is not straightforward. We propose to use the structuralinformation in each Possible Conflict to design different kindof executable models. In this work, where precise analyticalmodels1 is difficult to handle, we propose to build grey-boxmodels based on a state space neural network architecture de-rived from that structural information.
Preliminary results in an evaporation unit for a beet sugar fac-tory in Spain using real data show the feasibility of the ap-proach. The system has slow dynamics and due to the highcosts of the start-up mode, it should work for weeks uninter-rupted. Main difficulty in the existing models comes from thenumber of unknown parameters to be identified in the model,and the presence of non-linearities that requires expert manip-ulation in order to derive diagnosis-oriented models. An addi-tional problem to test any approach is that there is few infor-mation about faults actually happening. Hence, any feedbackfrom the model-based diagnosis system will be very helpfulfor the system operators.
The organization of this paper is as follows. First, we presentthe real system to be studied. Second, we introduce the Pos-sible Conflicts technique used to find minimal models. Third,we introduce the state space neural network approach to ob-tain grey box models for the Possible Conflicts. Next, we testthe first principle and the neural network models in the casestudy, drawing some conclusions.
2. DESCRIPTION OF THE CASE STUDY: AN EVAPORA-TION UNIT IN A BEET SUGAR FACTORY
We will test our proposal in an evaporation station of a beetsugar factory. In such processes there are four main stages:diffusion, purification, evaporation and crystallization. Evap-oration is the stage in which the water contained in a juicewith low sugar concentration is evaporated in order to obtainhigher sugar concentration. Afterwards, the resulting syrupis used to obtain sugar crystals in a set of vacuum pans. Fig-
1Based on Physics first principles, usually a collection of ODEs.
ure 1 shows the main elements in an evaporation plant: theevaporation units.
Figure 1. Five evaporation units for the evaporation sectionin a beet sugar factory in Spain.
Each evaporator has two chambers. The heating chamber sur-rounds a set of vertical tubes that contain boiling juice. Aflow of steam enters these chambers and transfers heat to thejuice providing the energy needed for boiling. The steam con-denses around the tubes and leaves the evaporator as conden-sate. The interior tubes, plus the evaporator upper and bot-tom spaces, it is named the juice chamber. A sugar solutionof low concentration (juice) flows continuously into the baseof the evaporator and starts boiling. Consequently, we geta solution of higher concentration at the output. The steamproduced from the water evaporation reaches the upper spaceand leaves the juice chamber by a pipe at the top.
2.1. The simulation models
The simulated plant consists on a set of five effects intercon-nected through pipelines and valves. Each effect is formed byone or several evaporation units. The steam generated in oneeffect is used to provide energy to the heating chambers of theevaporators of the next effect, while the juice flows from oneeffect to another increasing the sugar concentration. In thismultiple-effect arrangement only the first effect is fed withboilers steam and purified juice. In the last effect, the evapo-rated steam escapes from the juice chamber to the condensersand then to atmosphere.
The use of dynamic modelling and simulation techniques inthe process industry is an activity mainly oriented towards thedesign of installations and the training of the working staff butit can be also used to test new control or diagnosis strategies.For this factory there is a training simulator developed at theUniversity of Valladolid, Spain (Merino, Alves, & Acebes,2005). The main console of the training simulator for theevaporation section is shown in Figure 2.
2
First European Conference of the Prognostics and Health Management Society, 2012
215
European Conference of the Prognostics and Health Management Society, 2012
Figure 2. Schematic of the available simulator for training operators.
The simulator is articulated in two big systems: a simulationprogram and a distributed control system, where one of thecontrol units works as an instructor console. The objectiveof the simulation program is to reproduce in the most reli-able way the global dynamic performance of the process ofsugar production. The simulation is made using the Ecosim-Pro (EcosimPro, 2012) simulation language, and the model isdeveloped using libraries of elemental units developed in anobject oriented modelling approach (Acebes, Merino, Alves,& Prada, 2009). Additionally, the simulation code must workin real time and use an OPC (OLE for Process Control) in-terface (Alves, Normey-Rico, A., Acebes, & Prada, 2008) tocommunicate with the distributed control system. OPC is ade facto standard for communications on Windows applica-tions in industrial processes and it is included in almost everymodern SCADA. The OPC simulation program performs twotasks in parallel: solve the dynamic mathematical models ofthe process in real time, and attend requests from OPC clients.The SCADA system, that can be configured as operator or asinstructor console, acts as an OPC client, receives data fromthe simulation, changes the boundary conditions and activatefaults in the simulation program.
When building a model, the degree of complexity is variabledepending on the use of the model. In the case of evapora-tion units, it is possible to use different approximations to themodel (Luyben, 1990; Merino, 2008). In the training sim-
ulator, a detailed model is used, including dynamics in theliquid and vapour phase and complex phenomena such as ac-cumulation of incondensable gases or the absence of juice inthe evaporator, which allows steam flow via the juice pipes.These features are necessary in order to provide training ca-pabilities to the simulator. As an example of the type of equa-tions used in the simulator, energy balances to the juice cham-ber are shown:
dTjo
dt =Wjo(Hjo−Hjo)− ∂H
∂T Wji(Csi−Cso)
mt∂Hjo∂Tjo
−
E“HE−Hjo+
∂Hjo∂Cso
Cso
”mt
∂Hjo∂Tjo
Where:∂Hjo
∂Cso= −0.025104 · Tjs + 3.6939 · 10−5 · T 2
jo
∂Hjo
∂Tso= 4.06−0.025104·Cso+
(5.418936 · 10−4 · C2
so
)Tso
Together with these equations, mass balances, heat transmis-sion equations, state equations, etc., must be added for theliquid and vapour phases, resulting in a very complex non-linear model. Furthermore, in the physical model, there areseveral parameters that must be adjusted dynamically. In therelatively simple case studied in this article, only four param-eter were necessary. These were the set of equations that weneed to analyse and modify in order to perform model-based
3
First European Conference of the Prognostics and Health Management Society, 2012
216
European Conference of the Prognostics and Health Management Society, 2012
diagnosis, to obtain from the model the value of a variablethat is being measured, so that it is possible to compare bothvalues. For this to occur, it is necessary that the number ofmeasured variables in the process is sufficiently large to al-low calculating one measured variable from other ones. Onthe other hand, there is no need of a match between the phys-ical causality of the modelled system and the causality im-posed by the availability of the measures. This involves thesymbolic manipulation of the mathematical model, which isusually complex, even when using object oriented modellinglanguages that allow non-causal modelling. For example, inthe case of evaporation, the juice level is a measured vari-able. This variable, from the point of view of the physicalmodelling, is a state variable that is calculated by numericalintegration. In the case of fault diagnosis, this variable is ameasured one that cannot be calculated by the model by inte-gration without appearing a high index problem. This makesit necessary to manipulate the model so that the present bind-ing disappears.
3. POSSIBLE CONFLICTS FOR STRUCTURAL MODELDECOMPOSITION
3.1. Possible Conflicts
The computation of the set of Possible Conflicts(PCs) (Pulido et al., 2001; Pulido & Alonso-Gonzalez,2004) is a system model decomposition method from the DXcommunity, which searches for the whole set of submodelsof a given model with minimal redundancy (the numberof equations in the submodel equals the set of unknownvariables plus one). PCs provide the minimal analyticalredundancy neccesary to perform fault diagnosis. PCs arecomputed off line and they can be used on line to performconsistency based diagnosis of dynamic systems. PCs alsoprovide the computational structure of the constraints thatgenerate redundancy. This structure can be used to build asimulation model, or –as we will show later– to obtain thestructure of a state space neural network.
Off-line PCs computation requires three steps:
1. To generate an abstract representation of the system asa hypergraph. The nodes of the hypergraph are sys-tem variables and the hyperarcs represent constraints be-tween these variables. These constraints are abstractedfrom the equations that relate system variables.
2. To derive Minimal Evaluation Chains (MECs), which areminimal connected over constrained subsystems. Theexistence of a MEC is a necessary condition for analyt-ical redundancy to exist. MECs have the potential to besolved using local propagation (solving one equation inone unknown) from the measurements.
3. To generate Minimal Evaluation Models (MEMs) as-
signing causality2 to the constraints of the MEC. MEMsare directed hypergraphs that specify the order in whichequations should be locally solved starting from mea-surements to generate the subsystem output.
In Consistency-based diagnosis (Reiter, 1987; Kleer &Williams, 1987) a conflict arises given a discrepancy betweenobserved and predicted values for a variable. Hence, conflictsare the result of the fault detection stage. But they also con-tain the necessary structural information for fault isolation.Possible Conflicts were designed to compute off-line thosesubsystems capable to become minimal conflicts online. Un-der fault conditions, conflicts are observed when the modeldescribed by a MEM is evaluated with available observations,because the model constraints and the input/measured valuesare inconsistent (Reiter, 1987; Kleer & Williams, 1987). Thisnotion leads to the definition of a Possible Conflict:
Definition 1 (Possible Conflict) The set of constraints in aMEC that give rise to at least one MEM.
Recent works have demonstrated the equivalence betweenMECs, Analytical Redundancy Relations (ARRs), and otherstructural model decomposition methods (Armengol et al.,2009).
3.2. Inclusion of temporal information in the models
There are two kinds of contraints in the model: Differentialconstraints, those used to model dynamic behaviour, and in-stantaneous constraints, those used to model static or instan-taneous relations between system variables.
Differential constraints represent a relation between a statevariable and its first derivative (x, dx
dt ). These constraintscan be used in the MEMs in two ways, depending on the se-lected causality assignment. In integral causality, constraint issolved as x(t) = x(t−1)+
∫ t
t−1x(t)dt. In derivative causal-
ity, x(t) = dxdt assumes that the derivative can be computed
based on present and past samples for x. Integral causal-ity usually implies using simulation, and it is the preferredapproach in the DX field. Derivative causality is the pre-ferred approach in the FDI approach. Both have been demon-strated to be equivalent for numerical models, assuming ad-equate sampling rates and precise approximations for deriva-tive computation are available, and assuming initial condi-tions for simulation are known (Chantler, Daus, Vikatos, &Coghill, 1996). PCs can easily handle both types of causality,since they only represent a different causal assignment whilebuilding MEMs (Pulido et al., 2010).
Special attention must be paid to loops in the MEM (set ofequations that must be solved simultaneously). Loops con-taining differential constraints in integral causality are al-
2In this context, by causality assignment we mean every possible way onevariable in one equation can be solved assuming the remaining variablesare known.
4
First European Conference of the Prognostics and Health Management Society, 2012
217
European Conference of the Prognostics and Health Management Society, 2012
lowed, because under integral causality the time indices aredifferent to both sides of the differential constraint (Dressler,1994, 1996). It is generally accepted that loops contain-ing differential constraints in derivative causality can not besolved (Blanke et al., 2006).
Summarizing, each MEM for a PC represents how to buildan executable model to monitor the behaviour of the subsys-tem defined by the PC. Such executable model can be imple-mented as a simulation model or as a state-observer (Pulidoet al., 2010). However, building such model for complex non-linear systems it is not a trivial task. In Section 5 we will showthe set of PCs obtained for our case study, and we will derivea simulation model for one of the PCs. In subsection 5.3 wewill show how a grey-box model using neural networks canbe obtained for the same PC. Next section shows the fun-damentals for the type of neural network model used in thiswork.
4. STATE SPACE NEURAL NETWORKS FOR BEHAVIOURESTIMATION
State Space Neural Networks (ssNN) (Zamarreno & Vega,1998) is a great tool for modelling non-linear processes asshown in several cases (Gonzalez Lanza & Zamarreno, 2002;Zamarreno, Vega, Garcıa, & Francisco, 2000); even in thesugar industry (Zamarreno & Vega, 1997). Main advan-tages of such modelling approach are its ability for represent-ing any non-linear dynamics, and what is called a parallelmodel. This model represents the cause-effect process dy-namics without considering past inputs and/or past outputs.The dynamic relation is modelled by the state layer, whichcalculates the internal state of the network using just currentinputs of the model and internal state values from the previ-ous time step.
The architecture of the ssNN (see figure 3) consists of fiveblocks, and each block represents a neural network layer.From left to right, the number of neurons at each layer is n,h, s, h2 and m. The third layer represents the state of thesystem (the dynamics). As can be seen in the figure, there isa feedback from the state layer to the previous layer, whichmeans that the current state depends (in a non-linear way) onthe state at the previous time step. The second and fourthlayers model the non-linear behaviour: from the inputs to thestates and from the states to the outputs, respectively. Thefirst and the fifth layers provide linear transformations frominputs and to outputs, respectively. The ssNN is implementedby the following mathematical representation:
~x(t+ 1) = Wh · f1(W r · ~x(t) +W i · ~u(t) +Bh)
~y(t) = W o · f2(Wh2 · ~x(t) +Bh2)
where the parameters are weight matrices, W , and bias vec-tors, B:
• W i, Wh, W r, Wh2, W o are matrices with dimension
LINu
LINx
LINy
NL NL
BhBh2
W i Wh
W r
W o dWh2 d
LIN: Linear Processing Elements (Neurons)
NL: Non-Linear Processing Elements
Figure 3. Generic state space Neural Network architecture
h x n, s x h, h x s, h2 x s and m x h2, respectively.
• Bh and Bh2 are bias vectors with h and h2 elements re-spectively.
• f1 and f2 are two functions (non-linear, in general)which are applied elementwise to a vector or matrix.They are usually of sigmoid type.
For some processes, where some a priori knowledge aboutthe first principle equations can be obtained, a black boxmodel could be too generic to obtain good results. But thisknowledge can be used to restrict the architecture of themodel, so we end up with a grey box model that can be betteradjusted to mimic the process. Next section will illustrate thetraining process to obtain a specific grey box model for a PCrelated to an evaporation unit.
5. RESULTS ON THE CASE STUDY
5.1. PCs for the evaporation unit
As mentioned in Section 2.1 the evaporation section of thesugar factory is made up of five effects working sequentiallyto increase the sugar concentration in the syrup. All the evap-oration units in the same effect share the same steam outputconduit, and provide the steam for the next effect, thus par-tially coupling the behaviour of all the units. For our tests wehave focused on the first evaporation unit in the first effect.
There are several assumptions that must be made in order tosimplify the original model used in the training simulator,and to use those first principles equations for diagnosis. Inour case, we simplified the dynamic processes actually hap-pening inside the evaporation chamber, and we assumed thesystem was in only one operation mode. The dynamic pro-cesses considered in the evaporation unit were: conservationlaw for the amount of sugar and no-sugar products, globalbalance of matter in the evaporation chamber, sugar balance,level in the chamber, energy balances, steam volume balance,interchanged heat, and pressures in the chamber. As a resultof this simplification process, our model was made up of 40equations based on first principles of physics, 44 unknownvariables , and 12 measured variables. Only 5 of these equa-tions were used to model the evolution of 5 state variables:C, T , M , juice out.T , Mvh.
5
First European Conference of the Prognostics and Health Management Society, 2012
218
European Conference of the Prognostics and Health Management Society, 2012
The algorithms used to compute the set of PCs provided 1058MECs, and 775 MEMs. The total number of PCs in this sys-tem were 237, but most of them shared the same fault iso-lation capabilities since only 8 of the original 40 equationsmodel relevant faulty behavior.
In the original model there are several equations containingpartial derivatives, and several non-linear functions. As a con-sequence, most of the generated MEMs can be hardly imple-mented, although it is analytically possible. The problem wefaced at that point was to implement the relevant MEMs, be-cause it would be necessary to write by hand each simulationmodel. The process needs to be supervised by the modellingexpert, thus producing a bottleneck in the development of thediagnosis system. As a consequence, to test the approach, wehave modelled only one of the PCs, PC195, whose MEM isgraphically described in Figure 4. The MEM is a directed hy-pergraph which represents how the equations must be used tocompute the output, steam out.P , using just measurementsas inputs, and how the inputs are used to compute the inter-mediate unknown variables. Each solid arc represent an in-stantaneous constraint. Each dashed arc represent a differen-tial constraint. In this system we use integral causality, henceeach dashed arc means that we must perform integration toobtain the value of the state variable.
We selected the subsystem because it contains 16 equations,several input measurements –juice in.W , juice in.Brix,juice out.T , and level juice.signal–, and several state-variables –M , C, and Mvh–. The observed output variable issteam out.P . Hence, it has enough complexity to be a goodtest for the state space neural model.
5.2. The experimental data-set
PC195 was implemented in EcosimPro (EcosimPro, 2012).We run a set of five experiments using real data from the fac-tory for an intermediate month in the five month campaign.Experiment 1 consisted of 900 data points taken from 9 mea-surements in the system every 30 seconds. Experiments 2, 3,4 and 5 consisted of 2800 data points taken from the same 9measurements every 30 seconds3. Data sets 1, 2, and 4 rep-resent nominal behaviour. Data set 3 represent a fault in theoutput sensor.
In order to monitor the nominal behavior and perform faultdetection, we empirically determined a threshold. Figure 7shows the performance of the model on the four scenarios. Itcan be seen that the simulation model is able to monitor thenominal behaviour and also to detect the fault, but as shownin experiments 2 and 3 the estimations obtained are not veryaccurate. This is due mainly to the assumptions made regard-ing unknown parameters and boundary conditions.
3Since the first experiment is shorter than the other four, we do not show itsresults.
5.3. State space neural network models for PCs
PC195 graphically specifies in Figure 4 the relations betweenthe inputs (juice level.signal, juice in.W , juice out.T ,and juice in.Brix) and the states (M , C, Mvh andsteam out.P ). Moreover, the last input (juice in.Brix) canbe considered constant along time, so it can be removed fromthe model. The output of the model is steam out.P , so thereis a direct relation between the output and one of the states.Taking this into account, the ssNN architecture can be cus-tomized to represent the process characteristics in a betterway, as described in Figure 5. The non-linear (hidden) layeris split into four parts, and each part (represented by NL in-side a square) has a number of neurons (h1, h2, h3, h4) thatmust be adjusted by trial and error to represent the nonlineardynamics of each state.
This simplified ssNN architecture can be viewed as removingsome of the weights between layers, or setting zeros in somespecific elements of the weight matrices (the matrices can beseen in figure 6). Dimension of matrices W i, W r, Wh, andW o is (h1 + h2 + h3 + h4)× 3, (h1 + h2 + h3 + h4)× 4,4× (h1 + h2 + h3 + h4), and 1× 4, respectively.
5.3.1. Training
Training is the process of modifying the parameters (weightsand bias) of the neural network to adjust its output to the pro-cess output. Error between the neural network output andprocess output has to be minimized, so the training procedureis an optimization task where some index, Sum Squared Error(SSE) in our case, has to be minimized.
A feedforward network is quite easy to train, using the back-propagation method or some of its variants. But a recurrentneural network (such as ssNN) is more difficult to train due tothe recurrent connections. Stochastic methods are an alterna-tive for this kind of neural network, which results in easier toimplement training algorithms. The Modified Random Opti-mization Method (Solis & Wets, 1981) has been selected inthis work, but with some modifications to improve conver-gence as shown in (Gonzalez-Lanza & Zamarreno, 2002).
For training this ssNN architecture, we used the experimentsexplained in Section 5.2. Experiments 1, 3 and 5 were thetraining set, and experiments 2 and 4 were used for validation.The only parameter to tune in this ad hoc ssNN is the numberof hidden neurons at the second layer. With 5 neurons at eachpart is enough to represent the data, thus a total of 20 sigmoidneurons.
Figure 8 shows the evolution of the estimated and measuredvariable for PC195 for the selected 4 experiments. To use thessNN model for fault detection, a new threshold was empiri-cally calculated. Again, the ssNN model is able to track thenominal behaviour (experiments 1, 2, and 4 in Figure 8), andto correctly detect the fault in experiment 3 when the residual
6
First European Conference of the Prognostics and Health Management Society, 2012
219
European Conference of the Prognostics and Health Management Society, 2012
steam_out.P*
Pvh
Mvh steam_out.T
juice_out.T* level_juice.signal*
Vj
Mvh’
E steam_out.W
juice_out.T*juice_out.Brix Pvh juice_out.T
CPSV steam_out.p
C’
juice_in.W* Ci M
juice_in.Brix
EC
M’
juice_in.W* juice_out.W E
juice_out.P PSJ
juice_in.P
juice_out.Brix juice_out.P h
Vj
Figure 4. Minimal Evaluable Model schematics for Possible Conflict PC195. The estimated variable is Pvh. The correspondingmeasured variable is Steam out.P .
Figure 5. Simplified ssNN to better represent the model for the PC.7
First European Conference of the Prognostics and Health Management Society, 2012
220
European Conference of the Prognostics and Health Management Society, 2012
W i =
i1,1 i1,2 i1,3
......
...ih1,1 ih1,2 ih1,3
0 ih1+1,2 0...
......
0 ih2,2 0
0 0 ih2+1,3
......
...0 i0 ih3,3
ih3+1,1 0 ih3+1,3
......
...ih4,1 0 ih4,3
W r =
r1,1 r1,2 r1,3 0
......
......
rh1,1 rh1,2 rh1,3 0
rh1+1,1 rh1+1,2 0 0
......
......
rh2,1 rh2,2 0 0
0 rh2+1,2 rh2+1,3 rh2+1,4
......
......
0 rh3,2 rh3,3 rh3,4
0 0 rh3+1,3 0...
......
...0 0 rh4,3 0
Wh =
d1,1 · · · d1,h10 · · · 00 · · · 00 · · · 0
0 · · · 0d2,h1+1 · · · d2,h2
0 · · · 00 · · · 0
0 · · · 00 · · · 0
d3,h2+1 · · · d3,h30 · · · 0
0 · · · 00 · · · 00 · · · 0
d4,h3+1 · · · d4,h4
W o =[0 0 0 1
]
Figure 6. Simplified weight matrices W i, W r, Wh, and W o for the ssNN implementing PC195.
exceeds the threshold.
Looking at results in Figures 7 and 8 we can see that bothmodels can be used to monitor the evolution of variableSteam out.P . Main difference comes from the bias intro-duced by the parameters in the first principles models in Fig. 7leading to a higher threshold for fault detection. Nevertheless,the model can still be used for monitoring and fault detection.
The ssNN model used only 3 experiments for training, andwas able to track the nominal behaviour more accurately, andit was capable to detect the fault in the sensor. However, moretraining is necessary considering data from different months.
6. CONCLUSIONS
In this work we have proposed to use Possible Conflicts todecompose a large system model into smaller models withminimal redundancy for fault detection and isolation. Pos-sible Conflicts provide the structural models (equations, in-puts, outputs, and state variables) required for model-basedfault detection and isolation, and these models can be imple-mented as simulation or state-observer. Since deriving suchmodels for complex non-linear models it is not straightfor-ward and requires the participation of modelling experts, wehave proposed to use the structural information in the modelto design a neural network grey box model using a state spacearchitecture.
8
First European Conference of the Prognostics and Health Management Society, 2012
221
European Conference of the Prognostics and Health Management Society, 2012
0 500 1000 1500 2000 25000.6
0.8
1
1.2
1.4
1.6
Time
stea
mout.
P
MeasurementPC Estimation
0 500 1000 1500 2000 25000
0.1
0.2
0.3
0.4
0.5
Time
Res
idual
ResidualThreshold
0 500 1000 1500 2000 25000.6
0.8
1
1.2
1.4
1.6
Time
stea
mout.
P
MeasurementPC Estimation
0 500 1000 1500 2000 25000
0.1
0.2
0.3
0.4
0.5
Time
Res
idual
ResidualThreshold
0 500 1000 1500 2000 25000.6
0.8
1
1.2
1.4
1.6
Time
stea
mout.
P
MeasurementPC Estimation
0 500 1000 1500 2000 25000
0.1
0.2
0.3
0.4
0.5
Time
Res
idual
ResidualThreshold
0 500 1000 1500 2000 25000.6
0.8
1
1.2
1.4
1.6
Time
stea
mout.
P
MeasurementPC Estimation
0 500 1000 1500 2000 25000
0.1
0.2
0.3
0.4
0.5
Time
Res
idual
ResidualThreshold
Figure 7. Results for the PC tracking the system using the first principles model. The figure represent 4 experiments with realdata. On the left we represent the estimated and the real value of the magnitude. On the right we represent the evolution of theresidual, and the threshold.
0 500 1000 1500 2000 25000.6
0.8
1
1.2
1.4
1.6
Time
stea
mout.
P
MeasurementPC Estimation
0 500 1000 1500 2000 25000
0.1
0.2
0.3
0.4
0.5
Time
Res
idual
ResidualThreshold
0 500 1000 1500 2000 25000.6
0.8
1
1.2
1.4
1.6
Time
stea
mout.
P
MeasurementPC Estimation
0 500 1000 1500 2000 25000
0.1
0.2
0.3
0.4
0.5
Time
Res
idual
ResidualThreshold
0 500 1000 1500 2000 25000.6
0.8
1
1.2
1.4
1.6
Time
stea
mout.
P
MeasurementPC Estimation
0 500 1000 1500 2000 25000
0.1
0.2
0.3
0.4
0.5
Time
Res
idual
ResidualThreshold
0 500 1000 1500 2000 25000.6
0.8
1
1.2
1.4
1.6
Time
stea
mout.
P
MeasurementPC Estimation
0 500 1000 1500 2000 25000
0.1
0.2
0.3
0.4
0.5
Time
Res
idual
ResidualThreshold
Figure 8. Results for the PC tracking the system using the ssNN model. The figure represent 4 experiments with real data. Onthe left we represent the estimated and the real value of the magnitude. On the right we represent the evolution of the residual,and the threshold.
9
First European Conference of the Prognostics and Health Management Society, 2012
222
European Conference of the Prognostics and Health Management Society, 2012
The main conclusion is that the structure of the MinimalEvaluable Model for a Possible Conflict can guide the designof the state space model of the neural network, reducing itscomplexity and avoiding the process of multiple unknown pa-rameter estimation in the first principles models. Comparingresults of this approach in an evaporation unit of a beet sugarfactory we have observed that the ssNN is able to obtain sim-ilar of even better results than a simulation model manuallyderived by an expert. Both types of models were used to suc-cessfully monitor the process and to detect faults.
As further work, we plan to derive additional ssNNs and totest on a larger experiment data-set. Additionally, we needto test the approach at different times of the season, becausethis is a very slow evolving process whose parameters varyover time. Moreover, we can test more abstract models thatwill produce fewer PCs, but still containing same structuralinformation. Finally, once we introduce larger data sets, wewill use statistical tests to perform fault detection, and to de-termine the threshold to guarantee a maximum percentage offalse positives and false negatives.
ACKNOWLEDGMENT
This work has been supported by Spanish MCI grantsTIN2009-11326 and DPI2009-14410-C02-02. We wouldalso want to thank personnel from “Azucarera Espanola” forthe data provided for these experiments, and three anonymousreviewers for their comments that have help us to improve thispaper.
REFERENCES
Acebes, L., Merino, A., Alves, R., & Prada, C. de. (2009).Online energy diagnosis of sugar plants (in Spanish inthe original). RIAI - Revista Iberoamericana de Auto-matica e Informatica Industrial, 6(3), 68-75.
Alves, R., Normey-Rico, J., A., M., Acebes, L., & Prada,C. de. (2008, June). Distributed Continuous ProcessSimulation: An Industrial Case Study. Computers andChemical Engineering, 32(6), 1203-1213.
Armengol, J., Bregon, A., Escobet, T., Gelso, E., Krysander,M., Nyberg, M., et al. (2009). Minimal StructurallyOverdetermined sets for residual generation: A com-parison of alternative approaches. In Proceedings of the7th IFAC Symposium on Fault Detection, Supervisionand Safety of Technical Processes, SAFEPROCESS09(p. 1480-1485). Barcelona, Spain.
Blanke, M., Kinnaert, M., Lunze, J., & Staroswiecki,M. (2006). Diagnosis and Fault-Tolerant Control.Springer.
Chantler, M., Daus, S., Vikatos, T., & Coghill, G. (1996). Theuse of quantitative dynamic models and dependencyrecording engines. In Proceedings of the Seventh Inter-national Workshop on Principles of Diagnosis, DX96
(p. 59-68). Val Morin, Quebec, Canada.Cordier, M., Dague, P., Levy, F., Montmain, J., & Trave-
Massuyes, M. S. L. (2004). Conflicts versus Analyt-ical Redundancy Relations: a comparativeanalysis ofthe Model-based Diagnosis approach from the Artifi-cial Intelligence and Automatic Control perspectives.IEEE Trans. on Systems, Man, and Cybernetics. PartB: Cybernetics, 34(5), 2163-2177.
Daigle, M., Bregon, A., & Roychoudhury, I. (2011, Septem-ber). Distributed Damage Estimation for PrognosticsBased on Structural Model Decomposition. In Pro-ceedings of the Annual Conference of the Prognosticsand Health Management Society 2011 (p. 198-208).
Dressler, O. (1994). Model-based Diagnosis on Board:Magellan-MT Inside. In Working Notes of the Inter-national Workshop on Principles of Diagnosis, DX94.Goslar, Germany.
Dressler, O. (1996). On-line diagnosis and monitoringof dynamic systems based on qualitativemodels anddependency-recording diagnosis engines. In Proceed-ings of the Twelfth European Conference on ArtificialIntelligence, ECAI-96 (p. 461-465). John Wiley andSons, New York.
Dressler, O., & Struss, P. (1996). The Consistency-based approach to automated diagnosis of devices. InG. Brewka (Ed.), Principles of Knowledge Representa-tion (p. 269-314). CSLI Publications, Standford.
Empresarios Agrupados Internacional. (2012). EcosimPro.http://www.ecosimpro.com/. Madrid, Spain.
Gertler, J. (1998). Fault detection and diagnosis in Engineer-ing Systems. Marcel Dekker, Inc., Basel.
Gonzalez-Lanza, P., & Zamarreno, J. (2002, january). Ahybrid method for training a feedback neural network.In First International ICSC-NAISO Congress on NeuroFuzzy Technologies NF 2002. Havana - Cuba.
Gonzalez Lanza, P., & Zamarreno, J. (2002). A short-termtemperature forecaster based on a state space neuralnetwork. Engineering Applications of Artificial Intelli-gence, 15(5), 459 - 464.
Hamscher, W., Console, L., & Kleer (Eds.), J. de. (1992).Readings in Model based Diagnosis. Morgan-Kaufmann Pub., San Mateo.
Kleer, J. de, & Williams, B. (1987). Diagnosing multiplefaults. Artificial Intelligente, 32, 97-130.
Luyben, W. (1990). Process modeling, simulation, and con-trol for chemical engineers. McGraw-Hill.
Merino, A. (2008). Librerıa de modelos del cuarto de remo-lacha de una industria azucarera para un simuladorde entrenamiento de operarios. Unpublished doctoraldissertation, Universidad de Valladolid.
Merino, A., Alves, R., & Acebes, L. (2005). A training sim-ulator for the evaporation section of a beet sugar pro-duction process. In Proceedings of the 2005 EuropeanSimulation and Modelling conference.
10
First European Conference of the Prognostics and Health Management Society, 2012
223
European Conference of the Prognostics and Health Management Society, 2012
Patton, R. J., Frank, P. M., & Clark, R. N. (2000). Issues infault diagnosis for dynamic systems. Springer Verlag,New York.
Pulido, B., & Alonso-Gonzalez, C. (2004). Possible conflicts:a compilation technique for consistency-based diagno-sis. ”IEEE Trans. on Systems, Man, and Cybernetics.Part B: Cybernetics”, 34(5), 2192-2206.
Pulido, B., Alonso-Gonzalez, C., & Acebes, F. (2001).Consistency-based diagnosis of dynamic systems us-ing quantitative models and off-line dependency-recording. In 12th International Workshop on Princi-ples of Diagnosis (DX-01) (p. 175-182). Sansicario,Italy.
Pulido, B., Bregon, A., & Alonso-Gonzalez, C. (2010).Analyzing the influence of differential constraints inPossible Conflict and ARR computation. In CurrentTopics in Artficial Intelligence, CAEPIA 2009 SelectedPapers. P. Meseguer, L. Mandow, R. M. Gasca Eds.Springer-Verlag Berlin.
Reiter, R. (1987). A Theory of Diagnosis from First Princi-ples. Artificial Intelligence, 32, 57-95.
Solis, F., & Wets, R. J.-B. (1981). Minimization by Ran-dom Search Techniques. Mathematics of OperationsResearch, 6, 19–30.
Zamarreno, J., & Vega, P. (1997). Identification and predic-tive control of a melter unit used in the sugar industry.Artificial Intelligence in Engineering, 11(4), 365 - 373.
Zamarreno, J., & Vega, P. (1998). State space neural network.Properties and application. Neural Networks, 11(6),1099–1112.
Zamarreno, J., Vega, P., Garcıa, L., & Francisco, M. (2000).State-space neural network for modelling, predictionand control. Control Engineering Practice, 8(9), 1063- 1075.
BIOGRAPHIES
Belarmino Pulido received his Licenciate degree, M.Sc. de-gree, and Ph.D. degree in Computer Science from the Uni-versity of Valladolid, Valladolid, Spain, in 1992, 1995, and2001 respectively. In 1994 he joined the Dept. of Computer
Science at the University of Valladolid, where he is AssociateProfessor since 2002. His main research interests are Model-based and Knowledge-based reasoning, for Supervision andDiagnosis. He has worked in several national and Europeanfunded projects related with Supervision and Diagnosis, andis the coordinator of the Spanish Network on Supervision andDiagnosis of Complex Systems since 2005.
Jesus Maria Zamarreno has a degree in Physics, and PhDin Physics from the University of Valladolid, Spain, where heis Lecturer (Associate Professor). He is with the Dept. ofSystem Engineering and Automatic Control. He is a memberof the IFAC Spanish section CEA, and also is Advisor of theISA student section at Valladolid. His research interests areArtificial Neural Networks, Agent-based modelling, Model-based predictive control, and OPC Applications.
Alejandro Merino received has a degree in Chemical Engi-neering and M.Sc. and PhD degree in Process and SystemsEngineering from the University of Valladolid, Spain. He iscurrently Assistant Professor at University of Burgos, Spain,and also senior researcher at the Centre of Sugar Technology.He has worked in different projects related to modeling andoptimization of complex industrial processes. His researchinterest is modelling and optimization of dynamic processes.
Anibal Bregon received his B.Sc., M.Sc., and Ph.D. degreesin Computer Science from the University of Valladolid, Val-ladolid, Spain, in 2005, 2007, and 2010, respectively. Cur-rently he is Assistant Professor and Research Assistant at theDepartment of Computer Science from the University of Val-ladolid. From September 2005 to June 2010, he was GraduateResearch Assistant with the Intelligent Systems Group at theUniversity of Valladolid, Spain. During that time he was vis-iting scholar at the Institute for Software Integrated Systems,Vanderbilt University, Nashville, TN, USA; the Dept. ofElectrical Engineering, Linkoeping University, Linkoeping,Sweden; and the Diagnostics and Prognostics Group, NASAAmes Research Center, Mountain View, CA, USA. His cur-rent research interests include model-based reasoning for di-agnosis, prognostics, health-management, and distributed di-agnosis of complex physical systems.
11
First European Conference of the Prognostics and Health Management Society, 2012
224
Virtual Framework for Validation and Verification of System Design Requirements to enable Condition Based Maintenance
Dipl.-Ing. Heiko Mikat1, Dr. Dipl.-Ing. Antonino Marco Siddiolo2, Dipl.-Ing. Matthias Buderath3
1,2,3Cassidian, Manching, 85077 Germany
[email protected] [email protected]
ABSTRACT
During the last decade Condition Based Maintenance [CBM] became an important area of interest to reduce maintenance and logistic delays related down times and improve system effectiveness. Reliable diagnostic and prognostic capabilities that can identify and predict incipient failures are required to enable such a maintenance concept. For a successful integration of CBM into a system, the challenge beyond the development of suitable algorithms and monitoring concepts is also to validate and verify the appropriate design requirements. To justify additional investments into such a design approach it is also important to understand the benefits of the CBM solution. Throughout this paper we will define a framework that can be used to support the Validation & Verification [V&V] process for a CBM system in a virtual environment. The proposed framework can be tailored to any type of system design. It will be shown that an implementation of failure prediction capabilities can significantly improve the desired system performance outcomes and reduce the risk for resource management; on the other hand an enhanced online monitoring system without prognostics has only a limited potential to ensure the return on investment for developing and integrating such technologies. A case study for a hydraulic pump module will be carried out to illustrate the concept.
1. INTRODUCTION
A maintenance strategy cannot change the reliability figures of a system design but an optimized concept can improve availability and reduce operation and support costs (Reimann, Kacprzynski, Cabral, and Marini, 2009). Three maintenance strategies and measures to overcome the issues associated with operating a system with non-infinite reliability can be distinguished.
Strategy Measure
Run To Failure
Mainten. [RTFM]
On Condition Maintenance
[OCM]
Condition Based Maintenance
[CBM]
Corrective Maintenance
[CM]
General concept for RTFM
Failures which can cause neither a safety nor an economical critical event
Failures which can cause neither a safety nor an eco-nomical critical event. Requires online monitoring for fault isolation.
Preventive Maintenance
[PvM]
Not included
Failures which are safety or economical critical. Fixed intervals to decide if a PvM is required.
Failures which are safety or eco-nomical critical w/o prognostics. Requires online monitoring to enable dynamic intervals for PvM.
Predictive Maintenance
[PdM]
Not included
Not included
Failures which are safety or eco-nomical critical with monitoring and prognostics. Enables dynamic intervals to plan and perform PdM when required.
Table 1. Maintenance strategies and measures
A definition for the different concepts that will be used in the proposed framework is given in Table 1.
Standardized methods like Failure Mode Effects and Criticality Analysis (FMECA) or Common Mode Analysis are used to allocate probabilities and criticalities to each single failure mode in a system. The results are used to decide which failures are acceptable during operation and which ones have to be avoided through the introduction of a PvM or in case of a CBM concept, for which components it is expedient to develop capabilities to enable PdM. Monitoring or prediction methods to support the decision whether a PvM or PdM is required will always be imperfect. This will cause erroneous replacements of healthy components (known as No Fault Found [NFF]) and a waste of useful life by too early replacements of degrading components.
_____________________ H. Mikat et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
First European Conference of the Prognostics and Health Management Society, 2012
225
European Conference of Prognostics and Health Management Society 2012
2
Figure 1. Enhanced Health Monitoring concept
Especially in the case of PdM, where a potential failure or degradation should be announced while the component still operates within the specified performance limits, the avoidance of NFFs and simultaneous realization of a high sensitivity to incipient failures is a challenge. For the realization of a dynamic scheduling of maintenance intervals, it is necessary to realize online condition monitoring to receive and process all information to decide when a PvM or PdM action is required. If the different components and the system itself are not designed to provide and process all required information, it is not possible to realize an optimized CBM concept (Dunsdon & Harrington, 2009). For this reason it is mandatory to establish all relevant requirements from the beginning of the system design phase. These requirements cannot be treated like general design requirements related to Maintainability or Testability aspects. Whereas a Build-In Test [BIT] can be specified through a fault isolation and NFF rate, a CBM system would also need the specification and verification of detecting failures before they occur and predicting future trends with a verifiable accuracy. The difference between BIT and an Enhanced Health Monitoring [EnHM] concept is illustrated in Figure 1.
Especially if the CBM system shall not only support the optimization of spares and personnel management but also be designed to shift scheduled intervals - which are important to ensure system safety aspects - into dynamic condition-based intervals, it is of high relevance to ensure traceability of how the CBM capabilities needs to be incorporated into the system design. Selected Key Performance Indicators [KPIs] can be defined to represent customer requirements or industrial interests. An understanding of how CBM affects these KPIs is needed to justify increased development and procurement costs plus a more complex system design.
Figure 2. Hierarchical structure of the framework
The general hierarchical structure of how a Service Capability Rate [SCR] can be derived from the design and support elements of a system is shown in Figure 2. This architecture is used for the definition of the framework that will be described throughout this paper.
A SCR can vary from a success rate for performing reconnaissance missions in the field of the military aviation over transporting passengers or material for the civil sector to producing any type of goods in the industrial sector. The baseline parameters are Reliability, Maintainability and Testability [RMT], specifying how many and when any failure events are expected, how counter measures can be realized and which fault isolation capabilities are provided. The logistic concept [LOG] provides information on how resources like personnel, spares and consumables are supported. The maintenance strategy [MNT] specifies how the scheduled and unscheduled events are managed. The concept for Enhanced Health Management [EHM] has been introduced to specify the potential for the realization of CBM through EnHM and prognostics.
These baseline elements are considered as design and support elements of the system. The next level, as an outcome of the design and support level, is considered as Life Cycle Costs [LCC] related. The Mean Waiting Time [MWT] denotes how much time is lost due to waiting for missing resources; therefore it is related to periods during which the system cannot generate profit. The Maintenance Index [MID] indicates how much maintenance effort is required in Maintenance Man Hours [MMH] per Operational Hour [OH]. The Inverse Logistics Maintenance Ratio [ILMR] is used to quantify the amount of unscheduled events per OH, hence indicating the required capacity for spares to ensure the operational availability of the system. Based on these parameters and the system specific operational scenario, various KPIs can be derived. Important parameters are the operational availability of the material required to support the system for fulfilling its service aims [A0MAT] and the operational availability of the system itself [A0SYS]; these two parameters can be used to trace customer requirements and derive the SCR parameter. The required material can again be anything that is needed to support the system specific service task, like payload equipment for aircraft missions or industrial goods for production purposes.
The following sections will give an overview of a generic framework, addressing all above mentioned aspects by describing the conceptual design and purpose of the framework as well as basic assumptions and definitions.
The framework described on the following pages can be understood as a multifunctional environment, providing the capability to validate design and conceptual requirements as well as a tool for an integrated simulation concept of various modules composed to a complex system architecture for verification purposes. The general idea is shown in Figure 3.
First European Conference of the Prognostics and Health Management Society, 2012
226
European Conference of Prognostics and Health Management Society 2012
3
Figure 3. V-Model for framework applications
The modus as "Virtual Validation Environment" enables the derivation and validation of dedicated requirements for a system layout and EHM integration. Furthermore the "Integrated Simulation" modus supports model-based verification of KPIs and EHM requirements through the integration of validated simulation modules for diagnostics and prognostics on component or subsystem level.
To demonstrate the concept we will describe the simulation framework and conduct a case study. The case study will be carried out by showing how a simulation module for monitoring the status of a hydraulic pump could be integrated into the simulation environment and support the verification of RMT and EHM requirements.
2. DESCRIPTION OF THE SIMULATION CONCEPT
The main aim of the work presented in this paper is to develop a simulation environment that can be used to perform trade-off studies for system design and maintenance concept aspects emphasizing the capability to include the evaluation of a CBM potential. As described in the introduction, we will distinguish between three different maintenance strategies and measures. As the framework has originally been developed to support aircraft design decisions, where - due to safety and economic reasons - RTFM shall be avoided, the RTFM strategy has been excluded. This assumption would also be valid for other complex or cost intensive applications like passenger transportation or industrial facilities. The decision tree which has been defined as basis for the framework is shown below.
Figure 4. RMT, MNT and EHM Flowchart
2.1. Maintenance Parameters
According to the online monitoring capabilities, subsets of the primary failures specified by RMT will belong to the OCM or the CBM branch. A further partitioning into the different measures depends on the monitoring capabilities and definition of fixed maintenance intervals for inspection and overhaul. The probability that a failure belongs to one class is defined by the probability allocation parameter:
∑
∑=Ρ
ii
jj
j λ
λ
(1)
In the case of PPREDC (Predictive - CBM) the index j would denote all failure modes belonging to the class "Predictive Measures", while the index i would describe the sum of all failure modes belonging to the class "CBM Measures". It has been assumed that in excess of the primary failures classified by CM, PvM or PdM, each system also generates a number of false alarms (FA). As PvM and PdM would avoid the occurrence of a failure during service, the "Corrective Measures" are the only classes which generate additional secondary faults (SFLT) with the probability PSFLT. For the overall simulation it should be considered that each maintenance action will also cause a secondary maintenance (SMNT) induced failure (defined by the probability PSMNT). These maintenance induced failures can be mishandling, wrong installation or other secondary damages during overhaul and replacement or repair activities on the system (Byer, Hess, and Fila 2001). As each PvM and PdM should avoid the occurrence of a failure, it has to be performed before the failure happens. That means the introduction of such a measure would reduce the useful life of the system or component. This aspect has been introduced as additional probability for erroneous early replacements of the respective part. Due to the online monitoring of the CBM concept, this error will be lower for the PvM measures in the CBM branch than for those in the OCM part. Also it can be assumed that the evaluation of the information for PdM enables a much higher accuracy and confidence on estimating the optimum time to replace the monitored component than the monitoring without prognostics. Hence the waste of useful life for PdM can be considered to be lower than for PvM measures (Spare, 2001).
2.2. Reliability, Maintainability and Testability
The top level failure rate distribution is given by the RMT requirements as composition of all individual primary failure modes of the system. The probability for additional false alarms has been introduced as percentage false alarm rate for the respective class of events. It should be noted that - for maintainability aspects - each failure mode has been treated as individual event requiring a maintenance action. The maintainability aspect is described by the Mean Time To Repair for each individual failure mode MTTRi.
First European Conference of the Prognostics and Health Management Society, 2012
227
European Conference of Prognostics and Health Management Society 2012
4
Knowing the individual failure rates, a joint value on system level can be derived:
∑
∑ ⋅=
ii
iii
SYS
MTTR
MTTRλ
λ
(2)
A common approach for complex applications like aircrafts is to define a BIT failure isolation rate, specified through the capability to isolate single point failures to one or multiple root causes. It is assumed that CBM monitored components will have an ideal fault isolation capability, reducing the number of potential candidates for a single point failure to one single source. Considering this assumption and the fact that fault isolation for BIT monitored equipment has only to be performed once and the subsequent troubleshooting process for identifying the correct failure source would only include multiples of the replacement and checkout time for individual components, a formula for the resulting MTTR considering imperfect fault isolation can be derived (fdi: fault detection and isolation):
)1()ˆ( fdiCBMOCMPREVOCORRO pMTTR δ−⋅Ρ+Ρ⋅Ρ+⋅Ρ=∆
SYSRES MTTRMTTRMTTR ⋅∆=
(3)
with:
)1()()1(ˆ:2
)1(1 nkk fdink
fdifdifdifdi pkpppp −+⋅−⋅−⋅+= ∑=
−δ
where pfdik indicates the probability to isolate a single point failure to k = 2, … n sources as testability requirement and δfdi as fraction of the replacement time required to perform the fault isolation. The imperfect BIT fault isolation will not only affect the repair time but also the resulting maintenance effort. Hence, calculation of the increased probability for maintenance induced failures in the corrective class of the OCM branch is implemented accordingly (δfdi = 0):
pOCM SMNTCORRSMNT ˆ)( ⋅Ρ=Ρ (4)
2.3. Logistic Parameters
The main parameter within the scope of a logistic concept for estimation of system availability is the mean delay time for unscheduled events. This value is composed of an administrative and a logistic delay [Mean Logistics Delay Time: MLDT] fraction giving an average parameter for the MWT. The MLDT parameter can be derived from the probability density estimate for the resulting failure rate of unscheduled events. Using these assumptions an estimate of the MLDT can be derived:
0
:1
:
max
max
)(
5,0)()(
Tpdf
Tpdf
MLDT
iusiusi
iLeadsusiusi
s +⋅
⋅⋅−⋅=
∑
∑
=
=
λ
λλ
λλ
λλλ
(5)
with (excluding secondary effects, which are added to receive the resulting unscheduled failure rate):
)]()1([ PREVCCORRCFACCBMFAOOCMSysus Ρ+Ρ+Ρ⋅Ρ+Ρ+⋅Ρ⋅=λλ
maxmax ),1)(( λλλλλ ⋅=== pfrcdf sus
and λSys as overall system failure rate, λus as resulting failure rate for all unscheduled events, pdf(λus) / cdf(λus) as probability density / cumulative distribution function of λus, pfr as fill rate factor of spares in the operational scenario with pfr = 1 for nSpares(λmax), TLead as the maintenance related lead time (time between two spares deliveries or mean waiting time on maintenance specialists) and T0 as the administrative delay time. Each element belonging to class other than PdM is treated as unscheduled event, while it is assumed that the capability to predict the occurrence of an event shifts it from being unscheduled to a scheduled maintenance. An arbitrary MLDT variation as a function of the spares fill rate is shown in Figure 5.
Figure 5. Mean Logistic Delay Time variation
The resulting MWT is the weighted average for scheduled and unscheduled events:
Sys
ususSys MLDTTMWT
λλλλ ⋅+⋅−
= 0)( (6)
If PdM enables an accurate prediction of the time to failure, it can be assumed that the uncertainties for this class are reduced. This idea should reflect system operation without the need to consider a conservative assumption about the number of spares needed to maintain the system operational.
2.4. Enhanced Health Management Parameters
The EHM parameter set can be described through the values of PCBM, PPREDC and PFAC. It should be noted that the framework implies that only an EHM monitored failure can also be predicted. It is also assumed that false alarms caused by other means of monitoring are ignored if the EHM algorithm for the respective failure mode does not confirm the failure. As EHM requires a deeper knowledge of the system it cannot be assumed that this approach works also in the opposite direction, ignoring a false alarm of an EHM monitored component if other monitoring features are not confirming the failure.
First European Conference of the Prognostics and Health Management Society, 2012
228
European Conference of Prognostics and Health Management Society 2012
5
The accuracy of prediction has been identified as one key design parameter for the development of prognostic algorithms and concepts (Saxena, Roychoudhury, Celaya, Saha, Saha, and Goebel, 2010). The following assumptions have been made for the derivation of accuracy and precision; these will result in a probability for too early or missed replacements and can be used as requirements for the development of suitable algorithms:
- The prediction horizon has to ensure failures do not appear during the lead time. The lead time can be a time of continuous operation, the time interval between two spare deliveries or until maintenance specialists will be available.
- The prediction error ε is always a function of the prediction accuracy θ and the expected lead time TLead:
LeadT⋅−=
2
21
θθε (7)
- The minimum required prediction horizon Ph is defined accordingly:
Leadh T⋅=Ρ 2
1θ
(8)
Assuming a fixed accuracy θ, it can be concluded that a replacement of the degrading component at tRep = θ·tPred
would avoid the failure with the probability specified by θ. Considering the mean and minima/maxima prediction regimes with an accuracy θ, the following relations for the respective waste of useful life EWULi can be derived:
Conservative ε=ΕWULMax
(9) Optimal 21
1
θθε
−−⋅=ΕWULMean
Opportunistic 21
)2(1
θθθε
−⋅−−⋅=ΕWULMin
Figure 6 depicts these regimes for θ = 90%. Assuming the conservative situation that all regimes can occur with the same probability, it can be concluded that the average waste of useful life is equal to ΕWUL = ΕWULMean.
- The resulting waste of useful life due to predictive maintenance is a function of the respective failure rate:
iWULii λλ ⋅Ε=∆ (10)
Figure 6. Prediction error regimes
2.5. Derivation of Performance Parameters
The system performance parameters can be derived according to Eq. (11), (12) (excluding scheduled overhauls):
)(1
10
iiii MWTMTTR
A+⋅+
=λ (11)
SYSMAT AASCR 00 ⋅=
(12)
with λi as overall failure rate.
2.6. Uncertainty Representation
As the aim of this work was to develop a framework that has not to rely on pseudo-empirical simulation results, it was required to find closed form solutions for all stochastic processes that are used in the model. Therefore all distribution parameters like mean and variances have been propagated through the model by assuming stochastic independence for all single failure modes and a stochastic correlation of all failure modes that are interdependent.
Assuming weibull distributed time to failures with unitary shape parameter and therefore a constant failure rate (design and manufacturing processes should ensure constant failure rates but due to varying conditions and tolerances the results are usually distributed), we can derive the expression for the propagation of the uncorrelated parameters PUC from class j belonging to branch i:
∑
∑=Ρ
ii
jj
UCj 2
2
λ
λ
(13)
The equivalent parameter for correlated events PC can be derived as:
2jUCiCj Ρ⋅Ρ=Ρ (14)
with Pj as the probability allocation parameter of event j caused by event i.
All primary failure rates can be treated as independent events with a covariance of cov(zi,zj) ≈ 0. Only for merging the resulting primary with the secondary and maintenance induced failures, the respective covariances have to be taken into account. The secondary failures will only occur due to a primary failure belonging to the class “Corrective Measures”; a maintenance induced failure will only occur due to a previous event belonging to any class of the OCM or CBM branch. Moreover the relative increase in the failure rate of primary events will cause the same relative increase in the rate of secondary events. These relations motivated to imply a perfect linear correlation for these two scenarios to derive the respective covariance:
)(),cov( ijji zVarzz ⋅Ρ= (15)
First European Conference of the Prognostics and Health Management Society, 2012
229
European Conference of Prognostics and Health Management Society 2012
6
with Pj as probability allocation parameter of event j caused by event i.
Well known laws for the calculation with stochastic variables have been used to propagate all mean and variance parameters through the system model (Elandt-Johnson & Johnson, 1980; Stuart & Ord, 1998; Blumenfeld, 2001).
By applying these rules, we obtain the resulting distribution functions that will be used to estimate the distributions for the parameters MWT, ILMR and MID. As the maintenance effort is independent from logistic delays, they are again treated as independent variables, providing the basis to calculate the resulting distributions of A0i.
The specific distributions for the various parameters that have been used in the framework are listed in Table 2. Near real-time capable maximum likelihood estimators have been implemented into the simulation to estimate the distribution parameters by using the propagated expectation and variance of each stochastic variable as input.
Arbitrary simulations with random number distributions instead of the closed form solution for an OCM and CBM concept have been carried out to validate the concept. It can be seen that the results are sufficient accurate to assume the environment can be used to simulate processes with stochastic variables in a closed form solution (see Figure 7).
Failure rates: Two-parametric weibull distribution with constant failure rate
False alarms: Lognormal distribution Prediction Error: Lognormal distribution MWT: Lognormal distribution MTTR: Lognormal distribution ILMR: Two-parametric weibull distribution MID: Lognormal distribution A0: Two-parametric weibull distribution
Table 2. Parameter distribution type
Figure 7. Monte-Carlo validation
3. APPLICATION AS VIRTUAL VALIDATION ENVIRONMENT
The validation process is mainly based on a bottom-up and top-down justification and traceability analysis of all system design requirements. The idea for supporting this concept by utilizing the proposed framework is shown in Figure 8. The validation is performed by tracing all failure mode specific EHM requirements to the top level system requirements. The parameter CBMR comprises all EHM features. It is composed of the diagnostic [HMC] and the prognostic [FPC] part. Prognostic accuracy [PA], and prognostic coverage [PC] are used to describe the resulting FPC. The HMC is defined by the detection rate [DR] and false alarm rate [FAR]. The traceability to component level design requirements for hardware and software development is realized according to Eq. (2) by using the respective failure rates as weighting factors.
The following sections will give an overview of how a trade-off study could look like. A simplified cost-benefit approach will be discussed. More complex applications to find the optimum solution involving multiple cost functions will be the scope of future activities. Two arbitrary simulation runs have been conducted to illustrate and discuss the application as virtual validation environment. The first scenario simulates different design solutions for CBM without any PdM, only improving the fault isolation capabilities and conditional awareness of the system. The second scenario uses the same system design as baseline and evaluates a CBM concept with an integrated PdM capability, enabling the full potential of CBM.
This comparison should help to understand the impact of diagnostic and prognostic approaches on the three selected parameters SCR, MID and ILMR and if any saving potentials can be identified. It has to be noted that the results will vary if the logistic or maintainability parameters are modified; nevertheless the shown cases will provide sufficient information to discuss the main aspects. In the following discussion, the variance of each parameter can be understood as a factor describing the individual risk while the expectation value represents the potential to fulfil operational objectives.
Figure 8. EHM validation
First European Conference of the Prognostics and Health Management Society, 2012
230
European Conference of Prognostics and Health Management Society 2012
7
3.1. EHM without Prediction Capabilities
For this study, the parameter "CBM Capability" quantifies the online monitoring features without predicting any future trends. From the results presented in Figure 9 it can be seen that the implementation of EnHM without simultaneous development of prediction capabilities can mainly improve the MID, hence reducing the maintenance effort per OH. This observation can be explained with the improved fault isolation and optimized preventive maintenance due to the online monitoring capabilities of EHM. The reduction of MMH/OH will also ensure an improvement in the resulting SCR of the system; however since all failure events are still unscheduled, this improvement will not be the same as for a fully integrated CBM system with PdM. This effect can also be seen in the almost unaffected trend of ILMR. The minor improvement in ILMR is due to the reduced number of false alarms for a redundant monitoring concept using a fusion of BIT and EHM for status assessments and the optimized preventive maintenance methods.
As a result it can be concluded that enhanced diagnostics without prognostics will mainly reduce the maintenance effort expectation and variance. While the reduced expectation value corresponds to less maintenance activities per OH, indicates the reduced variance a potential for a better scheduling of resources and manpower. The increase in the SCR expectation is a side effect of the improvement seen in the MID.
Figure 9. Sensitivity study EHM without PdM
3.2. EHM with Prediction Capabilities
By performing the same simulation as before with a CBM system including prediction capabilities for all monitored failure modes (now "CBM Capability" represents the quantity of failures that are monitored and can be predicted), the PdM concept reveals itself with its full potential. The implementation of prognostics has a significant impact on all three parameters by optimizing the expectation value and reducing the respective variance (see Figure 10).
Figure 10. Sensitivity study EHM with PdM
The potential to move unscheduled events into a scheduled scenario, without the need to incorporate all uncertainties associated with a system that enters service, reduces the risk for all parameters.
The improved SCR expectation trend is mainly related to the avoidance of secondary failures, the reduced waste of useful life for PdM in comparison to PvM, the improvement for fault isolation of the predicted failures and the planning for a PdM measure before the failure occurs. The prediction of all events belonging to the class PdM has reduced the MWT to the fraction of the administrative delay time that is not allocated to the provision of spare parts and consumables. Simultaneously, the number of unscheduled events per OH is reduced, providing the potential to save costs for producing and storing spare parts before they are needed. The further improvement in the characteristics of the MID compared to the previous simulation without PdM can be explained with the reduction of the overall variance in the primary failure events and the avoidance of secondary failures by replacing the monitored item before a failure occurs.
3.3. Discussion of Results
By comparing the results for EHM with and without PdM it can be conducted that the enhanced health monitoring without prognosis may not compensate the investment needed for the development, production and operation of the health monitoring system. The minor improvement in the SCR due to the optimized trouble shooting process through online monitoring without reducing the risk, does not provide sufficient potential to reduce operational costs (e.g. less spares provisioning) without compromising customer requirements. Also the reduced MID cannot be seen as a savings potential, as the total number of people needed per operational site is defined through the number of people per maintenance action and the number of specialists per operating system. These people have to be paid, even if they have less work to do. The reduced variance is only an indicator that the risk for incorrect planning of maintenance
First European Conference of the Prognostics and Health Management Society, 2012
231
European Conference of Prognostics and Health Management Society 2012
8
resources is reduced. The more accurate PvM measures are expected to enable further improvement potential.
In contrast to the results for EHM without PdM it can be seen that the implementation of prognostics can help to reduce the overall risk for fulfilling service objectives. Simultaneously a reduction of the unscheduled events enables operation with less spares and the potential for a further simplification in the logistic concept with a reduced risk to compromise customer requirements. Therefore it can be concluded that the integration of an EHM system should aim for enhanced health monitoring and predictive capabilities, otherwise the return on investment for the integration of EHM cannot be guaranteed.
However, also for the EHM without prognosis it is possible to show the improvement potential and to use the proposed framework to derive requirements for the development of EHM functions. All resulting EHM requirements for diagnosis and prognosis are mainly quantified through the failure modes that can be monitored or predicted plus the accuracy and robustness of the respective algorithms.
3.4. Cost Benefit Analysis
This section should give an introduction of how a Cost-Benefit-Analysis can be carried out by utilizing the proposed framework. We will focus on a Performance-Based-Contract [PBC] scenario, where the system provider has to pay penalties if the operator cannot obtain the service aims (e.g. availability). A full blown Cost-Benefit-Analysis approach should be to find the global minimum of a function that takes the following cost elements into account:
i) CBM design and procurement costs; ii) PBC penalties and rewards; iii) Logistic cost elements; iv) Spares and resources management cost elements.
By utilizing the framework a distribution function for each performance indicator can be derived. The parameter of interest for availability contracting would be A0. By assuming reasonable cost functions for contractual penalties and operation and support cost (OSC) savings due to reduced spares provisioning by varying the fill rate, a minimum of the resulting cost function can be found.
An example plot for this scenario, assuming a contracted availability of 80% and deriving the delta costs by means of cost indexing, is shown in Figure 11. The allocation of the minimum resulting costs is determined by all design and support parameters. The risk to achieve this cost value can be quantified through the variance of each single parameter. By adding more cost functions to estimate the resulting operation and support costs, it is possible to find the optimum solution for an EHM design concept. The LCC simulation can either be used to identify an optimal EHM concept or to derive acceptable design cost values to satisfy a business case for a given operational scenario.
Figure 11. Cost functions for availability contracting
4. USE CASE FOR INTEGRATED SIMULATION CONCEPT
In this section a case-study related to a generic hydraulic pump module will be presented: the aim is to further understand the concepts so far explained and to quantitatively show the improvements in the design phase that can result by utilizing the approach here illustrated.
After a brief introduction regarding the pump system and its main sub-components, the interest will be focused on the bearings, as sub-component of the pump system. In fact, care has been spent on properly simulate meaningful bearings' conditions, namely the behaviour of a bearing in presence of a defect and the degradation of the bearing behaviour following a growth in defect's severity. Both nominal and faulty behaviours have been validated by means of experimental tests. The model has been therefore used to test new diagnostic and prognostic algorithms; in fact, faults can be implemented under different operating conditions rather than waiting for these to occur. A generic approach has been followed to verify and validate the model creation and to properly assess effectiveness and efficiency of algorithms for diagnosis and prognosis: this approach is illustrated, as a flow chart, in Figure 12.
Figure 12. Flow chart of the EHM designing phases
First European Conference of the Prognostics and Health Management Society, 2012
232
European Conference of Prognostics and Health Management Society 2012
9
The bearing dynamic model has been thereafter integrated in a general simulation pump framework that has been designed on purpose. The framework allows one to simulate the behaviour of a generic pump - within its sub-components – together with different monitoring capabilities on the various components: this way the use of the framework as a valuable tool for requirements' verification will be demonstrated, as well as the capabilities of the framework itself of assessing variation in system's performances when varying monitoring concepts.
4.1. Hydraulic Pump System
The hydraulic pump object of our interest is a variable displacement, axial piston pump. The most important groups are the Drive Group, the Displacement Group and the Control Valve Group. The Drive Group is the functional hearth of the system since it contains the axial pistons in the cylinder block and the control plate. The basis of the pump is an assembly of precision machined, high strength steel parts for the rotational functional parts, mounted in an alloy case. The main shaft is supported in rolling elements bearings. Pump sealing is achieved using either O-Rings or a mechanical seal. In Figure 13, a scheme is shown displaying the main actors of the system under investigation: in particular, one can recognize the metrological solutions that will characterize the enhanced monitoring capabilities of the system, namely a system of bi-axial accelerometers (to measure two orthogonal accelerations along the plane on which every roller bearing lies) and an electric chip detector to evaluate the level of contaminant in the hydraulic circuit.
There is a large number of items within the pump that will result on a system failure. Some of the pump's failures are direct consequence of the part failures (for example shear of the shaft); some others are indirect, e.g. debris in the hydraulic circuit. In the final simulation that will be performed, the failure of four pump sub-components will be considered, namely: bearings, sealing, shaft and pistons.
Figure 13. Hydraulic Pump scheme – The sub-components that will be the actors of the simulation are highlighted
The dynamic model of the first sub-component (the roller bearings) will be briefly presented in the next section.
4.2. Dynamic Model of Roller Bearings
In a bearings system, the time-variant characteristics are the result of the orbital motion of the rolling elements, whilst the non-linearity arises from effects due to the Hertzian force-deformation relationship. The model here presented and utilized is based on the work carried out by Sawalhi and Randall (2008). The main fundamental components of a rolling bearing are: the inner race, the outer race, the cage and the rolling elements. Moreover, important geometrical parameters are: the number of rolling elements nb, the element diameter Db, the pitch diameter Dp and the contact angle α (see Figure 14). The non-linear forces between the different elements, the time-varying stiffness, the clearance between rolling elements and races have been implemented into the model. The bearing has been modeled as a five Degrees of Freedom (DoF) system: two orthogonal DoF belong to the inner race/rotor component (xi and yi), two DoF are related to the pedestal/outer race (xo and yo) and the last one (yr) has been added to match the usually high frequency bearing response (16 kHz with 5% damping). Mass and stiffness of the outer race/pedestal on the other hand have been adjusted to match a low natural frequency of the system. Finally, mass and inertia of rolling elements are ignored.
The non-linear and time-variant model has been further detailed regarding its capabilities in reproducing health and faulty behaviours. These refinements are related to: a) random fluctuation of inner and outer race profiles; b) forces generated as a consequence of the roller element impact with the resulting profiles roughness; c) Elasto-hydrodynamic lubrication; d) slippage; e) mass unbalances and f) presence of spalling in the outer and inner race-way.
Figure 14. Roller bearing geometry and physics modeling scheme
First European Conference of the Prognostics and Health Management Society, 2012
233
European Conference of Prognostics and Health Management Society 2012
10
As illustrated in the flow-chart of Figure 12, the verification and validation approach follows a circular and continuous path among the conceptual model validation, the computerized model verification and the operational validation. The conceptual model validation refers to the problem of determining that concepts, theories and assumptions underlying the conceptual model are correct; whilst the model verification is defined as assuring that the computer programming and implementation of the conceptual model is correct. On the other hand, the operational validation is defined as determining that the model’s output behaviour has sufficient accuracy for the model’s intended purpose. In the case under investigation, the domain of the model’s intended applicability is wide, since both nominal and faulty behaviours have to be properly simulated. Moreover, the same approach has been followed to verify and validate algorithms for diagnostics and prognostics. In the end, if suitable diagnostic and prognostic concepts could be defined and successfully tested, it is possible to integrate the validated simulation modules into a general simulation framework in order to assess, evaluate and validate the performances of the system resulting from the integration of modules with EHM capabilities.
Figure 15. Envelope of the two signals used to detect the frequency-value of encoded impulsive transients
Several experimental tests have been conducted in order to validate the system. The iterative analysis of the experimental findings related to both nominal and faulty behaviors has allowed the continuous and better matching of the computerized model to reality (model validation). A challenge was the correct simulation of a defective bearing, the developing of tools to diagnose a defective behavior and the implementation of concepts for Remaining Useful Life [RUL] prediction.
Various kinds of defect have been simulated in real bearings, as – for example - spalls of different length and depth both in the inner and outer race. Common tools in the frequency domain can be used for the validation behaviour of baseline conditions; this is not generally true for faulty conditions. As a matter of fact, together with a simple monitoring of the quadratic mean of the acceleration, a data driven diagnostic approach has been implemented for the present study; experimental data have been used to train a neural network for defect detection and classification. The diagnostic approach has moreover been made more robust by the integration of a mathematical tool named Spectral Kurtosis (Antoni, 2004): this instrument gives the possibility to have an estimation of the band to be demodulated without the need of historical data. In Figure 15, a comparison is shown between the signals processing of the vertical acceleration measured on the pedestal of a faulty bearing and the analogous results gained by running a simulation of its computerized model: the Fourier transform magnitude of the squared filtered signals clearly shows the typical faulty frequencies of the bearing (given the bearing characteristics, a theoretical Ball Pass Frequency Outer race of 382.3 Hz was calculated) as spacing between harmonics both in the real (upper trend) and simulated (lower trend) results. In the end of the designing phase, a verified and validated dynamic model has been released. It has been therefore widely used to test new diagnostic and prognostic algorithms since the required diagnostic features can directly be derived from simulated signal pattern.
However, the development of suitable prognostic algorithms needs also to focus on the evaluation and prediction of trends or degradation paths. Hence it is necessary to further develop degradation models that can be used to simulate growing faults. The derivation of such models is not always straightforward, as the process of degradation is stochastic and does not always follow known parametric laws (Bechhoefer, 2008). Several model-based approaches have been adopted so far for failure prognosis (Orchard, 2007); among the various methodologies implemented, the most promising mathematical framework is the one based on Particle-Filtering. This approach allows handling nonlinear, non-Gaussian systems; it assumes: a) the definition of a set of fault indicators, for monitoring purposes, b) the availability of real-time process measurements and c) the existence of empirical knowledge to characterize both nominal and abnormal operating conditions.
First European Conference of the Prognostics and Health Management Society, 2012
234
European Conference of Prognostics and Health Management Society 2012
11
Figure 16. Model-based development of prognostic algorithms
Therefore, by means of this approach the current state estimates are in real-time updated and the algorithm predicts the evolution in time of the fault-indicators, providing the pdf of the RUL. Following the same verification and validation approach, the prediction algorithm has been designed. In Figure 16, one can see (upper graph) the process of validating the algorithm by running different simulations assuming representative degradation paths; in the lower graph, an example plot for a model-based RUL estimation is displayed.
The verified and validated model (regarding both its physical behaviour and the diagnostics and prognostics algorithms) has been therefore integrated into a simulation module, which will mimic the behaviour of a complex system. The model will be presented in the next section.
4.3. Hydraulic Pump Simulation
The simulation will regard four sub-components of the hydraulic pump, namely the sealing system (SEAL), the shaft (SHAF), the roller bearings (BEAR) and the piston-group (PIST). FMECA documents have been looked up in order to set realistic ratios between the values for the failure rates. Aim of the current simulation is to show and demonstrate how the developed framework can be usefully and effectively utilized in order to verify the fulfilments of the top-levels requirements.
Bearings models characterized by the enhanced diagnostic and prognostic capabilities just discussed have been integrated into the simulation framework; the system has been virtually equipped with accelerometers (see Figure 13) so that the health-state of the bearings system can be continuously checked. Then, as soon as a deviation from the baseline state is detected by the diagnostic algorithms, prognostic tools will process the acquired data and communicate the central processing and control unit estimated RULs and confidence levels. This will then affect the performances of the overall system and the framework
so far discussed will be therefore utilized to quantitatively assess the performances' variations by using the indexes already discussed in the previous sections. In other words, the primary results of the current simulation will be the failure rate distributions of the system; these will be fed to the virtual framework to derive the performance indexes and hence values directly related to customer satisfaction.
To handle a more realistic and complex scenario, the hydraulic system has been further virtually instrumented with an Electric Chip Detector (ECD – see Figure 13). This sensor measures in real time the amount of debris and contamination of the hydraulic liquid; this way, a preventive maintenance approach can be implemented for the piston group and the bearing system in the CBM branch. The bearing diagnostic algorithm can in fact be also used as fault confirmation for preventive actions on the pistons group.
Finally, the other components are considered to be classically monitored by means of an "On Condition Maintenance" approach, which results in corrective and preventive maintenance.
Hence, according to the on-line monitoring capabilities just introduced, the simplified simulation scheme in Figure 17 can be shown: it defines the primary failures that will belong to the OCM branch and the ones that will belong to the CBM branch.
In the following Figure 18, a diagram explaining the flow of the information in the verification procedure just presented is displayed. At the bottom of the graph lies the hydraulic pump model with its integrated enhanced monitoring concepts related to the bearing system. By assuming failure rate distributions for the different components, the simulation will randomly generate events; these will be treated accordingly to the system specifications and so the probability classes already shown in Figure 17 will be populated.
Figure 17. Maintenance approach hydraulic pump system
First European Conference of the Prognostics and Health Management Society, 2012
235
European Conference of Prognostics and Health Management Society 2012
12
Figure 18. Verification process of a hydraulic pump module
Therefore, the statistical parameters (mean and variance) of each failure mode can be calculated and, by using the virtual framework, they can also be easily propagated in order to get the distributions of the performance indexes: availability, maintenance index and inverse logistics maintenance ratio.
Verification of the EHM design requirements can be carried out by comparing the results of the validation phase with the distributions from the verification phase. The resulting error in the system performance parameters can be used to assess whether the design goals are met or not. Based on this assessment it can be decided whether the EHM concept needs to be revised or can be implemented. The results for the selected use case are shown in Figure 19.
The shown use case simplifies the system architecture to a single component. The same approach can be applied if the integration would cover multiple components and subsystems with individual failure modes.
Figure 19. Performance indexes simulation case study
5. CONCLUSIONS
The proposed framework can support the development of a CBM system by validating diagnostic and prognostic design requirements w.r.t. selected KPIs or customer requirements. Sensitivity studies revealed that a CBM system should aim for the integration of predictive capabilities, as the improvement potential for an online monitoring system without prognostics is limited to a reduced maintenance effort and minor improvements in availability or other performance parameters of the system.
The concept provides a simple but robust approach for trade-off studies during an early design stage. Further improvements of the framework will focus on the evaluation and integration of a generalized weibull correlation coefficient (Yacoub et al., 2005) to replace the assumption for linearity between primary and secondary effects. The next step for maturation will be to validate the concept with established simulation tools (e.g. Simlox) for spares and resource management.
The idea for an integration of cost estimations and optimizations has been discussed. Follow-up studies to derive cost functions with established LCC estimation tools (e.g. PRICE) will be carried out. The integration of authoritative cost functions to obtain a framework for a multidimensional optimization of costs related to EHM design parameters, PBC aspects as well as resources and logistics management will be the main scope for future activities.
The concept for model-based verification of top-level system requirements has been illustrated. This approach shall enable the evaluation and assessment of diagnostic and prognostic capabilities before the system enters service. The authors are convinced that the cost-efficient validation and verification of multiple monitoring and prediction functions composed to a complex system design can only be realized in a virtual environment. The proposed framework provides such an environment and will be further maturated to support the V&V process for the development of a CBM system.
First European Conference of the Prognostics and Health Management Society, 2012
236
European Conference of Prognostics and Health Management Society 2012
13
NOMENCLATURE
Symbols
ε Prediction Error θ Prediction Accuracy λ Failure Rate σ Standard Deviation
Abbreviations
A0 Availability BIT Build-In Test cdf Cumulative Distribution Function CBM Condition Based Maintenance CM Corrective Maintenance DoF Degrees of Freedom DR Detection Rate ECD Electric Chip Detector EHM Enhanced Health Management EnHM Enhanced Health Monitoring FA False Alarm FAR False Alarm Rate FPC Failure Prognosis Capability FMECA Failure Mode Effects and Criticality Analysis HMC Health Monitoring Capability ILMR Inverse Logistics Maintenance Ratio KPI Key Performance Indicator LCC Life Cycle Costs LOG Logistics MMH Maintenance Man Hours MNT Maintenance MTTR Mean Time To Repair MID Maintenance Index MWT Mean Waiting Time MLDT Mean Logistics Delay Time NFF No Fault Found OCM On Condition Maintenance OH Operational Hours OSC Operation and Support Cost PA Prognostic Accuracy PBC Performance Based Contract PC Prognostic Coverage pdf Probability Density Function PdM Predictive Maintenance pfr Spares Fill Rate PvM Preventive Maintenance RTFM Run To Failure Maintenance RMT Reliability, Maintainability and Testability RUL Remaining Useful Life SCR Service Capability Rate SFLT Secondary Faults SMNT Secondary Maintenance V&V Validation & Verification
REFERENCES
Antoni, J., (2004). The spectral kurtosis on nonstationary signals: Formalisation, some properties, and application. Proceedings of XII. European Signal Processing Conference, EUSIPCO, pp. 1167-1170, September 6-10, Vienna, Austria
Bechhoefer, E., (2008). A method for generalized prognostics of a component using Paris law. Proceedings of American Helicopter Society 64th Annual Forum, April 29 - May 1, Montreal, CA
Blumenfeld, D. (2001). Operations Research Calculations Handbook. CRC Press, p. 7
Byer, B., Hess, A. & Fila, L. (2001). Writing a convincing cost benefit analysis to substantiate autonomic logistics. Aerospace Conference
Dunsdon, J., Harrington, M. (2009). The application of open system architecture for condition based maintenance to complete IVHM. Aerospace Conference
Elandt-Johnson, R.C. & Johnson, N.L. (1980). Survival Models and Data Analysis. John Wiley and Sons, New York, p.69
Endo, H. (2005). A study of gear faults by simulation, and the development of differential diagnostic techniques. Ph.D. Dissertation, UNSW, Sydney
Orchard, M. E., (2007). A particle filtering-based framework for on-line fault diagnosis and failure prognosis. Doctoral dissertation. Georgia Institute of Technology, Atlanta, GA, USA.
Reimann, J., Kacprzynski, G., Cabral, D. & Marini, R. (2009). Using condition based maintenance to improve the profitability of performance based logistic contracts. Annual Conference of the Prognostics and Health Management Society
Sawalhi, N., Randall, R.B. (2008). Simulation gear and bearing interactions in the presence of faults. Part I: The combined gear bearing dynamic model and the simulation of localised faults. Mechanical Systems and Signal Processing, vol. 22, pp. 1924-1951
Saxena, A., Roychoudhury, I., Celaya, J.R., Saha, S., Saha, B. & Goebel, K. (2010). Requirements Specifications for Prognostics: An Overview. American Institute of Aeronautics and Astronautics
Spare, J.H. (2001). Building the business case for condition-based maintenance. Transmission and Distribution Conference and Exposition
Stuart, A & Ord, K. (1998). Kendall’s Advanced Theory of Statistics. Arnold. London, 6th edition, p.351
Yacoub, M.D., Benevides da Costa, D., Dias U.S. & G.Fraidenraich (2005). Joint Statistics for Two Correlated Weibull Variates. IEEE Antennas and Wireless Propagation Letters, Vol. 4
First European Conference of the Prognostics and Health Management Society, 2012
237
European Conference of Prognostics and Health Management Society 2012
14
BIOGRAPHIES
Heiko Mikat was born in Berlin, Germany, in 1979. He received his M.S. degree in aeronautical engineering from the Technical University of Berlin, Germany, in 2008. From 2006 he worked as trainee and later on as Systems Engineer at Rolls-Royce Deutschland, Berlin, Germany, designing and testing engine fuel system concepts and control laws. Since 2009 he works as Systems Engineer at the CASSIDIAN Supply Systems Department and is responsible for the development of new health management technologies for aircraft systems. His current research activities are mainly focussing on the maturation of failure detection and prediction capabilities for electrical, mechanical and hydraulic aircraft equipment.
Antonino M. Siddiolo was born in Agrigento, Italy, in 1976. He received his M.S. and Ph.D. degrees in mechanical engineering from the University of Palermo, Italy, in 2000 and 2006, respectively. From 2004 to 2005 he was a Visiting Scholar at the Centre for Imaging Research and Advanced Materials Characterization, Department of Physics, University of Windsor, Ontario (Canada). Then, he worked as a researcher and Professor at the University of Palermo and as a Mechatronic Engineer for Sintesi SpA, Modugno (Bari), Italy. Currently, he works as Systems Engineer at the CASSIDIAN Supply Systems Department, supporting the Integrated System Health Monitoring (ISHM) project. His research activities and publications mainly concern non-contact optical three-dimensional measurements of objects and non-destructive ultrasonic evaluation of art-works. His main contributions are in the field of signal processing to decode fringe patterns and enhance the contrast of air-coupled ultrasonic images.
Matthias Buderath - Aeronautical Engineer with more than 25 years of experience in structural design, system engineering and product- and service support. Main expertise and competence is related to system integrity management, service solution architecture and integrated system health monitoring and management. Today he is head of technology development in CASSIDIAN. He is member of international Working Groups covering Through Life Cycle Management, Integrated System Health Management and Structural Health Management. He has published more than 50 papers in the field of Structural Health Management, Integrated Health Monitoring and Management, Structural Integrity Programme Management and Maintenance and Fleet Information Management Systems.
First European Conference of the Prognostics and Health Management Society, 2012
238
Poster Papers
Analyzing Imbalance in a 24 MW Steam Turbine
Afshin DaghighiAsli1, Vahid Rezaie
2, and Leila Hayati
2
1Morvarid Petrochemical Complex
2Mopasco Consulting Company
ABSTRACT
Imbalance in critical rotary equipment is one of the most
important factors, which should be controlled to prevent
great damages. In this case study we are discussing about a
24 MW steam turbine, which drives a propane compressor.
The radial vibration on the DE side of the turbine was
growing gradually to a high level close to the alarm's value.
Using FFTs, time signals, orbit diagrams, and phase
measurement led us to believe that the rotor became
imbalanced. After tripping and disassembling the turbine,
we found out, some blades of the impulse stage of HP
section got broken. Changing the rotor with the spare one,
and repair the damaged rotor, worked out. It was concluded
that using the vibration analysis technique is an effective
method to find critical rotating equipment’s faults at the
earliest levels. And performing the essential correcting tasks
to prevent secondary damages and specially decrease of
production.
1. INTRODUCTION
Vibration analysis technique is an effective method to find
critical rotating equipment's faults such as imbalance, which
is one of the most common defects of machinery that can be
so destructive. Trending online values and gathering FFTs;
phase measurements; time waveforms and orbit diagrams
can help us to determine the faults to prevent secondary
damages and specially production decrease at the earliest
level even if the machine is very big or sophisticated.
2. IMBALANCE
Condition that exists in a rotor when vibration force or
motion imparted to its bearings as result of centrifugal
forces(1)
. Vibration due to unbalance of a rotor is probably
the most common machinery defect. It is luckily also very
easy to detect and rectify. It may also be defined as the
uneven distribution of mass about a rotor’s rotating
centerline. There are two new terminologies used; one is
rotating centerline and the other is geometric centerline. The
rotating centerline is defined as the axis about which the
rotor would rotate if not constrained by its bearings (also
called the principle inertia axis or PIA). The geometric
centerline (GCL) is the physical centerline of the rotor.
When the two centerlines are coincident, then the rotor will
be in a state of balance. When they are apart, the rotor will
be unbalanced. There are three types of unbalance that can
be encountered on machines, and these are:
1. Static unbalance (PIA and GCL are parallel)
2. Couple unbalance (PIA and GCL intersect in the
center)
3. Dynamic unbalance (PIA and GCL do not touch or
coincide)(2)
3. DESCRIPTION OF THE PROBLEM
The turbine that mentioned is driver of the refrigerant
compressor (propane) which is for Morvarid petrochemical
complex-the 5th olefin –in Iran that feeds MehrPC (HDPE)
plant. This turbine actually plays the Heart role for the plant.
3.1. Technical Information(3)
Model: Siemens SST-600 Shaft Diameter at bearing DE:
250 mm
Power: 24 MW Shafter Diameter at bearing
NDE: 200 mm
Min speed: 3530 rpm Inlet steam pressure: 40 bar
Rated speed: 4633 rpm Inlet steam temperature:
392°C
Trip speed: 5096 rpm Outlet steam pressure: -0.8
bar
First critical speed: 2890 rpm Admission pressure: 5 bar
Second critical speed: 7566 rpm Admission Temperature
180°C
Bearing DE: RKS05-5* 50-
BETA=.5L.B.P.
Vibration alarm’s value: 150
μm
Bearing NDE: RKS-08-4* 60-
BETA=.5L.B.P.
Vibration Trip’s value: 194
μm
_____________________
Afshin DaghighiAsli et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License,
which permits unrestricted use, distribution, and reproduction in any
medium, provided the original author and source are credited.
First European Conference of the Prognostics and Health Management Society, 2012
240
European Conference of Prognostics and Health Management Society 2012
2
Figure 1. Turbo Compressor
In 14 Jan 2011, radial vibration values on turbine DE
bearing increased a little. Then data acquisition from online
monitoring system began to gather FFTs for precise
analyzing. Visit the bearing in early August 2011 assure us
that it could not be the source of high vibration level.
Figure 2. Not severe pitting (permissible clearance)
Gathering FFTs; time waveforms; orbits and phase values,
led us to believe that, rotor might have imbalance or bent
shaft defect. Meantime we discovered that X507 was the
most important and variable value.
Table 1. Vibration level DE bearing
Date X507
Date Y507
02-Apr-11 8 02-Apr-11 8
03-Apr-11 15 03-Apr-11 16
02-May-11 7.5 02-May-11 8
03-May-11 22 03-May-11 13
12-Aug-11 28 12-Aug-11 18
13-Aug-11 45 13-Aug-11 53
26-Sep-11 43 26-Sep-11 54
27-Sep-11 61 27-Sep-11 54
12-Oct-11 71.5 12-Oct-11 52
30-Oct-11 91 30-Oct-11 76
21-Nov-11 105 21-Nov-11 82
Figure 3.Vibration trend DE bearing
Figure 4. Vibration X507
Figure 5. X507 FFT (Obviously 1*rpm excited)
Comp.
Values X506 Y506 X505 Y505
VIB 3.9 6.1 17 15.5
PHASE 317 68 280 11
First European Conference of the Prognostics and Health Management Society, 2012
241
European Conference of Prognostics and Health Management Society 2012
3
Figure 6. X507 Time Waveform (Absolute Sinusoidal)
Figure 7. X507 Orbit diagram (Absolute Elliptical)
Table 1. Vibration & Phase Values in rated speed
Turbine Values X507 Y507 X508 Y508
VIB 81.8 58.5 43.5 10.5
PHASE 357 113 302 24
3.2. Axial Phase Measurement
In order to identify the fault accurately; we should measure
the phase difference between axial direction of DE and NDE
bearing. Whereas we have DC process values with no raw
signal in axial direction instead of axial vibration output in
online monitoring system. Then we decided to measure
synchronic time waveforms on the two bearings' housing as
following:
Figure 8. Turbine NDE Synchronous Time waveforms
Figure 9. Turbine DE Synchronous Time waveforms
(180° Phase difference)
As for 180-degree difference in axial measurement
directions; the axial phase difference between turbine DE
and NDE bearings is zero. That means the bent shaft theory
is rejected. Accordingly imbalance was the final diagnosis.
Eventually when vibration level on X507 was about 122
μm, turbine was tripped and disassembled so we saw that
some blades of the impulse stage for HP section was
broken.
Figure 10. Turbine Disassembly
Figure 11. Hp Broken Blades (Full shot)
First European Conference of the Prognostics and Health Management Society, 2012
242
European Conference of Prognostics and Health Management Society 2012
4
Figure 12. Hp broken blades (Detail shot)
By changing the rotor, the vibration levels lowered to near
its initial values. And the damaged rotor was sent to shop
for repairing.
4. CONCLUSION
However imbalance fault is a common defect, but we should
consider that it can happen for any machine of any brand,
any type at any time. Meantime we should be aware of this
point, trend diagram is a key to recognize and solve
problems, and we should not confine to alarm values in
standard and manufacturer's recommendation. Finally it is
necessary to mention that, axial phase measurement is a
very important tool to understand the difference between
imbalance and bent shaft.
Acknowledgements
I want to thank my wife Leila, my parents, and my brother.
And I want to give thanks to Mr. Sohrab Yazdani and
gratitude Mr. Vahid Rezaie and Mr. Mohsen Rezaie
(Parspc).
Bibliography
International Standard, ISO-1940-1, Second
Edition,Published in Switzerland, 2003- 08-15, Page
2.
C Scheffer and P Girdhar, Practical Machinery Vibration
Analysis & Predictive Maintenance, 2004, Newnes
Publications, Netherlands, Pages 90-92.
Siemens AG Power Generation, Siemens Vendor Data
Book, Siemens Publications, 2006, Duisburg.
First European Conference of the Prognostics and Health Management Society, 2012
243
Economic reasoning for Asset Health Management Systems in volatile markets
Katja Gutsche1
1Hochschule Ruhr West, Mülheim a.d.R., 45473, Germany [email protected]
ABSTRACT
With respect to the growing demand in asset reliability, availability, maintainability, safety and productivity (RAMS-LCC) diagnosis and prognostic asset health management (PHM) systems provide more detailed asset health information which allows improved maintenance decision-making. This gives the opportunity for a more efficient, safer system operation (e.g. aircraft, production facilities) and therefore a more competitive enterprise. Of course, the implementation and use of PHM causes recurring and non-recurring costs, which have to be at least covered by the savings due to benefits achieved by cost avoidance through better asset health knowledge. The economic justification is essential for a positive decision upon the installation of PHM. This becomes more complex as the benefits depend on the operation circumstances which then are strongly influenced by the market situation. The market situation is strongly determined by the market demand, number of competitors and speed of technological changes. As these parameters are especially relevant in the producing industry, this shall be the system of choice in this paper. The question to be raised is how much the economical attractiveness of PHM systems correlates with an increase in market impermanence as to be seen globally in most market segments.
1. INTRODUCTION
Asset health plays a tremendous role for the production efficiency as well as system safety and therefore for the competiveness of especially asset intensive enterprises. Asset intensive enterprises are characterized by a higher number of industrial facilities needed for the production process which in addition are generally cost-intensive in investment. This becomes even more important in a global economy where profit margins decrease and customer satisfaction has to be constantly on a high level. In addition, there are technical changes as there is an
Increase in automation,
Increase of system and asset chaining,
Increase of asset complexity,
Increase in availability requests.
As a consequence, the relevance of health management systems is further increasing.
Their economic benefits have been outlined in several publications as e.g. (Banks & Reichard & Crow & Nickell, 2005), (Banks & Merenich, 2007), (Feldmann & Sandborn & Taoufik, 2008), (Al-Najjar, 2010). (MacConnell, 2007) lists the following as the major benefits:
1. Maintenance time savings,
2. Failure reduction,
3. False alarm avoidance,
4. Availability improvement – increase mean time between maintenance actions,
5. Spare and supply savings. There is no doubt that in sum prognostic and health management (PHM) decreases the efficiency loss caused by maintenance management driven by time or organizational restrictions rather than the use of detailed asset health knowledge, mostly expressed using wear-out stock. Wear-out stock (compare DIN 13306) defines the health of an asset. It indicates the degradation speed to the point where it can no longer operate in a safe and proper way. The wear-out stock (WS) is assumed to be high at the beginning (Time t = 0) of system use (WS0) and decreases due its use. Unless maintenance actions are undertaken the WS decreases to a critical value (WSmin) where the asset can no longer be maintained and has to be replaced in order to work properly again. If the maintenance management is done purely on a time base with no regard to the current system degradation status, the value creation through the productive system gets diminished. Figure 1 shows the reason for efficiency loss in traditional time-based maintenance This is an open-access article distributed under the terms of the Creative
Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
First European Conference of the Prognostics and Health Management Society, 2012
244
European Conference of Prognostics and Health Management Society 2012
2
management. Maintenance actions are undertaken far before the limit wear-out stock (WSmin) is reached because of no clear asset health data. The asset lifetime becomes reduced. In sum the premature undertaking of maintenance reduces
the potential time of use (T), the potential output (e.g. production units)
and increases the number of maintenance activities.
Figure 1. Maintenance efficiency loss
Besides the optimization of preventive maintenance tasks the use of PHM also improves the failure time line (figure 2). This is because either there is a pure prevention of failure or there is a reduction of downtime because of detailed asset health information. Firstly with the asset health information the time till maintenance work starts gets reduced due to faster fault identification. Secondly the information from PHM systems decrease the time needed to actually refit.
Figure 2. Failure time line
Apart from these numerous positive effects of PHM, there are also challenges which have to be managed, as there are e.g.:
A large amount of special data is generated by the PHM.
Mostly selection and interpretation of most relevant data is not done by the system.
Decision making becomes more complex for the maintenance person in charge.
These challenges are listed at this point but will not be further analyzed at in this paper.
2. MOTIVATION
The economic attractiveness of PHM systems depends on the result of a cost-benefit analysis. This depends on the difference between the cost savings and the additional costs due to their implementation and use (section 1).
PHM has a notable effect on the asset availability which can be measured through (Wheeler & Kurtoglu & Scott, 2010) (Al-Najjar, 2010) (figure 3):
Reduction in (unplanned) stoppages,
Increase in mean time between maintenance actions,
Reduction of labor mean-time-to-detect,
Reduction of repair times,
Reduction of maintenance induced failures
and has therefore a positive effect on the direct and indirect maintenance costs which are mainly dependent on the maintenance time parameters as well as needed number of spare parts, cases of secondary damage and work accidents.
Figure 3 demonstrates the potential effect of the implementation of PHM systems (scenario 1) compared to their non-use (scenario 0) on the asset availability level (increasing) and the maintenance costs (decreasing).
Figure 3. Effect of PHM implementation
Seen from a life-cycle perspective PHM causes development, implementation, operation and maintenance expenses. Moreover prognostics may also cause false alarm but this shall not be looked at in this paper.
Table 1 lists the major potential costs and benefits of a PHM system application. Especially in the beginning investments
First European Conference of the Prognostics and Health Management Society, 2012
245
European Conference of Prognostics and Health Management Society 2012
3
have to be made before actually using the system for asset health monitoring. The investment expenses are determined by the software and hardware components, the installation and testing complexity as well as the needed staff training. During the period of PHM system use there are cost positions due to the data management and its maintenance. The potential benefits have been outlined in detail in the sections before and shall only be listed at this point.
Table 1. Costs and Benefits of PHM (*value dependent on operation circumstances)
Their actual value is variable due to probabilistic behavior of assets and their failure regime, the technical characteristics of the PHM (self-learning etc.) and their usability. Apart from these uncertainties which have to be taken into account when deciding on PHM, the overall result of the implementation of PHM depends on the operation intensity:
How tight is the operation schedule for the asset to be monitored with regard to the customer needs?
The relevant operation circumstances in producing industries can be expressed in
Available realization time (e.g. time until product delivery),
Number of waiting jobs,
Number of shifts/ operation intensity.
These parameters change more often due to more market volatility. Market volatility is defined as the magnitude of short-term fluctuation in a time line compared to its mean value or a defined trend curve. Figure 4 shows the development of the German Gross Domestic Product (GDP) adjusted for prices between 1951 and 2008. It illustrates that the economic cycles became shortened; hence the markets are more volatile. This has major effects on the manufacturing industry and in
consequence on the operation circumstances and finally on the cost-benefit result of the use of PHM systems.
Figure 4. Market volatility (Statistisches Bundesamt, 2009)
3. ECONOMIC REASONING IN VOLATILE MARKETS
Whereas the costs listed in table 1 stay relatively stable no matter how the operation circumstances change (data management expenses increase due to more data volume), the value of the potential benefits increases with a decrease of available realization time and an increase in waiting jobs and operation intensity.
3.1. Value of availability
The value of a gain in availability changes depending on the operation circumstances. This value correlates with the failure costs. Failure costs are
Costs of decreased output before and after downtime,
Costs due to the downtime period (downtime costs) (see figure 2),
Opportunity cost,
Loss in asset value.
(Biedermann, 2008) outlines that the failure costs correlate with the percentage of downtime of overall asset lifetime and the level of use of the producing asset capacity (figure 5). In case of a constant percentage of downtime the failure costs decrease when the use of asset capacity use decreases. Illustrated with an example:
A manufacturing plant either works a) 24 hours/day (100% use of asset capacity) or b) 18 hours/day (75% use of asset capacity). The output per hour is 1unit worth 500 €. In case of a failure lasting one working day (downtime) the loss in production (failure costs) is in a) 24 * 500€=12.000€ and in b) 18 * 500€=9.000€.
The level of use of the asset capacity is one parameter describing the operation circumstances. As the level of
Costs Benefits
Software Reduction in failure rate*
Hardware Reduction in downtime
Training Decrease in quality rejections*
Installation & testing Reduction in spare parts*
Data management* Reduction in accident compensations*
PHM system maintenance & updates
Decrease in lifetime loss
First European Conference of the Prognostics and Health Management Society, 2012
246
European Conference of Prognostics and Health Management Society 2012
4
capacity use depends on the operation intensity which is depending on the market demand (high demand – high use level), the cause-effect chain can be summed up in the following way:
market situation ↓ use of asset capacity ↓ failure costs ↓ value of availability↓
Figure 5. Value of non-availability in producing industries (failure costs) (Biedermann, 2008)
3.2. Volatility gap in availability savings
With a change in market there is a positive or negative effect on the manufacturing industry. The change in product demand directly influences the manufacturing asset. Depending on the positive or negative change in the market, the asset work load increases due to a higher product demand and decreases when there is a decline in market demand. These scenarios are outlined in figure 6, upper part. During an economic upturn the asset is used to its maximum. The asset work load is adjusted when there is less demand for the product or service. Corresponding to the development in asset work load there is a change in availability savings (SA) (figure 6, lower part). If the asset is always used to assumed high level and there is no change in market demand the value of savings through availability increase due to the use of PHM systems (SAnv) is higher than when there are changes in market parameters, expressed by a higher volatility (SAv).
Comparing these two scenarios a so-called volatility gap in savings through the use of PHM systems evolves.
As the saving in availability is directly linked to the benefits of PHM systems, the cause-effect chain in section 3.1. can be extended in the following manner:
market situation ↓ use of asset capacity ↓ failure costs ↓ value of availability↓ benefit of PHM systems ↓
Figure 6. Effect of volatility on savings through availability increase SA
3.3. Numerical example
To highlight the importance of market effects on the economic attractiveness of PHM systems a numerical example will be outlined.
The following assumptions shall be made:
Table 2. Numerical example - assumptions
The volatility gap shall be shown by comparing the following two scenarios
Scenario A: constant operating hours of two shifts of 8 hours on 365 days per year = 5840 h/year = maximum use of asset capacity
Scenario B: Changing operating hours (see table 4, column 2).
Use period [years] 10
fault time per year [% of
operating hours] 1
value of downtime [€/per
hour] 150
fault prevention rate through
PHM system [%] 20
First European Conference of the Prognostics and Health Management Society, 2012
247
European Conference of Prognostics and Health Management Society 2012
5
Table 3 and table 4 show the potential availability savings through the use of PHM systems. As in scenario B the asset is not used to its full extend the sum of availability savings is lower than in scenario A (13.578 € < 17.520 €). The difference of 3942€ represents the volatility gap indicated in figure 6.
Table 3. Scenario A – maximum use of capacity, no volatility
Table 4. Scenario B - changing use of capacity and market volatility
4. SUMMARY
The integration of a health management system is primarily based on the economic reasoning. PHM provides failure predictions, reduces the downtime, expands the maintenance intervals and therefore decreases the efficiency loss in maintenance and increases the system availability. However, PHM causes investment expenses and recurring costs for the PHM system sustainment. Whereas the latter are mostly independent of the market situation in which the operator uses the asset to fulfill customer demands, the potential benefits strongly depend on the operation
circumstances (e.g. working shifts, time buffers within the production line, stock of semi-finished products).
As there is not only a higher level of competition within the markets but also more volatility (e.g. steel production) which strongly influences the operation circumstances, these dynamic effects have to be taken into account when deciding on the introduction of a PHM system.
This paper outlines the effect of market volatility on the economic reasoning of the use of PHM systems. Depending on the market situation the volatility gap describes the cost avoidance due to higher system availability. The value of cost avoidance then depends on the level of use of asset capacity.
In volatile markets modular PHM systems may be an option. These systems allow a downsizing. Instead of installing the all-embracing PHM system, modular systems offer the big advantage of being sizeable according to the actual operation constraints (e.g. number of sensors and interpreting algorithms). This allows a downsizing of recurring costs for the health management system and makes them more flexible with respect to the increase in market volatility.
REFERENCES
Al-Najjar, B. (2010), Strategies for Maintenance Cost-effectiveness. In Holmberg et al. eMaintenance, pp.297-344
Banks, J. & Merenich, J. (2007). Cost Benefit Analysis for Asset Health Management Technology, IEEE Annual Reliability and Maintainability Symposium, pp. 95-100
Banks, J., Reichard, E., Crow, E., Nickell, E. (2005), How Engineers Can Conduct Cost-Benefit Analysis for PHM Systems, IEEE Aerospace Conference, pp. 3958-3967
Biedermann, H. (2008), Anlagenmanagement – Managementinstrumente zur Wertsteigerung, TÜV-Verlag
DIN EN 13306 (2010), Maintenance – Maintenance terminology
Feldmann, K., Sandborn, P., Taoufik, J. (2008), The Analysis of Return on Investment for PHM Applied to Electronic Systems, Proceedings of the International Conference on Prognostics and Health Management, October, Denver, CO
MacConnell, J.H. (2007), ISHM & Design: A review of the benefits of the ideal ISHM system, IEEE Aerospace Conference
Statistisches Bundesamt (2009), https://www-genesis.destatis.de/
Wheeler, K., Kurtoglu, T., Poll, S. (2010), A Survey of Health Management User Objectives in Aerospace Systems Related to Diagnostic an Prognostic Metrics,
Year
operating
hours
fault periods per
year [h]
fault reduction
through phm [%]
Availability
savings [€]
1 5840 58,4 11,68 1752
2 5840 58,4 11,68 1752
3 5840 58,4 11,68 1752
4 5840 58,4 11,68 1752
5 5840 58,4 11,68 1752
6 5840 58,4 11,68 1752
7 5840 58,4 11,68 1752
8 5840 58,4 11,68 1752
9 5840 58,4 11,68 1752
10 5840 58,4 11,68 1752
17520
sceanrio A
Year
operating
hours
fault periods per
year [h]
fault reduction
through phm [%]
Availability
savings [€]
1 5840 58,4 11,68 1752
2 5548 55,48 11,096 1664,4
3 5256 52,56 10,512 1576,8
4 4964 49,64 9,928 1489,2
5 4672 46,72 9,344 1401,6
6 4380 43,8 8,76 1314
7 4088 40,88 8,176 1226,4
8 3796 37,96 7,592 1138,8
9 3504 35,04 7,008 1051,2
10 3212 32,12 6,424 963,6
13578
sceanrio B
First European Conference of the Prognostics and Health Management Society, 2012
248
European Conference of Prognostics and Health Management Society 2012
6
International Journal of Prognostics and Health Management
First European Conference of the Prognostics and Health Management Society, 2012
249
System PHM Algorithm Maturation
Jean-Remi Massé1, Ouadie Hmad
3 and Xavier Boulet
2
1,2Safran Snecma, Moissy Cramayel, 77550, France
3SafranEngineering Services, Moissy Cramayel, 77550, France
ABSTRACT
The maturation of PHM functions is focused on two Key
Performance Indicators (KPI): The NFF, No Fault Found
ratio, P(No degradation|Detection), and the Probability Of
Detection POD, P(Detection|Degradation). The estimation
of the second KPI can be done by counting the global
abnormality threshold trespassing when each different kind
of degradation is simulated. The estimation of the first KPI
can be done through the following formula using the Bayes
rule:
)(
)(*)|()|(
DetectionP
ionNoDegradatPionNoDegradatDetectionPDetectionionNoDegradatP
P(Degradation) may be known through FMEA or field
experience. Typically, for a probability of 10-7
, a specified
NFF ratio of 1%, and an expected POD of 90%, the order of
magnitude of P(Detection| No degradation) should be 10-9
.
The estimation of such extreme level of probability needs
some parametric adjustment of the distribution of the global
abnormality score with no degradation. Two PHM functions
are considered as case studies: Turbofan engine start
capability (ESC) and turbofan engine lubrication oil
consumption (EOC). In ESC the global abnormality score is
a norm of a vector of specific abnormality scores. The
specific scores are centered and reduced residues between
expected values and observed values. Some specific scores
are devoted to starter air supply. Examples are duration of
phase 1 from starter air valve open command to ignition
speed. Other scores are devoted to fuel metering. Examples
are duration of phase 2 from ignition to cut off speed. The
expected values are estimations through regression relations
using as inputs the other specific scores and context
parameters such as lubrication oil temperature at start. The
regression relations are learnt on start records with no
degradations. Impact simulations of degradations on specific
scores are learnt on a phase 1 simulator based on torques
balance and on start test records including fuel metering
biases. In EOC, the global abnormality score is the daily
weekly or monthly consumption estimations on a daily
basis. Consumption estimations use linear regressions of oil
level measurements versus time at an invariable ground idle
speed corrected according to oil fill detections and oil
temperature. The over consumptions are simulated by drifts
in mean of the consumption estimations.
To reach acceptable POD at the specified NFF ratio three
improvements are needed for ESC:
Adjust the abnormality decision threshold
according to each candidate degradation using
extreme value quantiles on the global abnormality
score distribution
Average the global abnormality score on five
consecutive starts
Learn the regression relations specifically on each
engine.
The first improvement is a novelty. It is successfully applied
to both ESC and EOC functions. It is generic to all airborne
system PHM functions based on abnormality scores.
1. INTRODUCTION
For years, airborne system PHM maturity has been scaled in
reference to the popular “Technology readiness levels”
(Wikipedia, 2012). This is appropriate to control the
maturity of the function implementation. This does not
address the intrinsic maturity of the function independently
of the implementation.
Therefore, a maturation process of PHM functions is
followed. It uses six sigma concepts (Deming, W. E. 1966,
Forrest W. B. 2008). First, generic sub functions of system
PHM are considered. This is illustrated on two use cases.
Then, two Key Performance Indicators (KPI) are chosen
according to the considered sub functions and to airline
business models. The estimation of these KPI is defined.
_____________________
Jean-Remi Massé et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License,
which permits unrestricted use, distribution, and reproduction in any
medium, provided the original author and source are credited.
First European Conference of the Prognostics and Health Management Society, 2012
250
European Conference of Prognostics and Health Management Society 2012
2
To reach acceptable levels of KPI on the use cases, some
improvements of the functions are proposed.
2. CASE STUDIES
2.1. General
The first phase of the six sigma approach is the definition
phase. The function considered is the first item to define.
PHM functions are usually represented as OSA-CBM
architecture (MIMOSA, 1998). Table 1 typically represents
such an architectural architecture applied to a system PHM
(Lacaille J. 2010).
Table 1. Typical system PHM OSA-CBM summary
#1 DATA
ACQUISITION Acquire sensor and system data
#2 DATA
MANIPULATION
Extract the indicators
Acquire the context parameters
#3 STATE
DETECTION
Build the prediction model**
Score the prediction errors
#4 HEALTH
ASSESSMENT
Learn reference patterns (syndromes)**
Cluster according to references
Isolate the potentially degraded LRU(s)
or module(s) through Bayesian
calculation
Isolate the potentially degraded LRU(s)
or module(s) through fault isolation
manual on failure condition precursors
Score global abnormality*
Adjust the abnormality decision
thresholds**
Detect abnormality
#5 PROGNOSTIC
ASSESSMENT
Predict the probability of maximal
degradation before failure within a given
operational time
#6 ADVISORY
GENERATION
Establish a global diagnosis and
prognosis merging other health
monitoring means.
*Has a learning mode; **Is a learning mode
The specificities of a given PHM function are restricted to
level #1 Data acquisition and level #2 Data manipulation
through indicators and context parameters. The next levels
are, in general, common and may have à learning mode in
addition to the basic PHM mode. Such learning modes are
tagged in table 1 with an asterisk or two*.
The PHM function section considered for maturation is part
of level # 4 Health Assessment. It is tagged in table 1 with
bold characters:
Score global abnormality
Adjust the abnormality decision thresholds*
Detect abnormality.
The abnormality detection function considered here is based
on global abnormality score threshold trespassing.
On this general basis, two specific use cases are considered:
Engine start capability, ESC (Ausloos A., Grall E.,
Beauseroy P. Masse J.R., 2009. Mouton P.,
Ausloos A., Massé J -R., Aurousseau C. A.
Flandrois X.,2010, [8] Engine oil consumption, EOC (Demaison F.,
Massot G., Massé J –R., Flandrois X., Hmad O.,
Ricordeau J., 2010).
2.2. Engine Start Capability function
Engine start capability function, ESC, relies on a set of
indicators (Figure 1)
Extracted during start sequence
Sensitive to no start precursors.
Figure 1. Engine start capability, ESC, indicators
Some indicators are devoted to air supply degradations.
Examples are the duration of phase 1 of the start, from
starter air valve open command to ignition HP rotor speed,
or, the average acceleration of HP rotor during phase 1.
These indicators are sensitive to air starter valve slow
opening. Such degradation is a precursor of valve stuck
closed, which is a typical origin of no start.
Some indicators are devoted to fuel metering degradations.
Examples are phase 2 duration, from ignition to starter cut
speed, or, Exhaust Gas Temperature slope during phase 2.
Prediction error scores are centered and reduced residues
between expected values of indicators and observed values
of indicators.
The expected values of indicators are estimations, through
regression relations, using as inputs the other indicators and
context parameters such as lubrication oil temperature at
start. Referring to table 1, this is the basic PHM mode of
“#3 – State detection - Score the prediction errors”
The regression relations are learnt on start records with no
degradations. The means and standard deviations of the
residues needed for centering and reduction are learnt on the
same records. Referring to table 1, this is the learning mode
of “#3 – State detection - Build the prediction model”.
First European Conference of the Prognostics and Health Management Society, 2012
251
European Conference of Prognostics and Health Management Society 2012
3
The global abnormality score, , is the squared
Mahalanobis norm of the vector, , of prediction error
scores:
(1)
Referring to table 1, this is the basic PHM mode of “#4
Health assessment – Score global abnormality”
The correlation matrix, , is also learnt on the same records
with no degradations. This is the learning mode of “#4
Health assessment – Score global abnormality”
2.3. Engine Oil Consumption function
Engine oil consumption function, EOC, relies on oil level
extractions at taxi phase. The oil levels are captured at
constant ground idle speed when the switch based level
indication changes. A small correction of level is done
according to temperature.
Figure 2. Engine oil consumption, EOC, oil level captures
The global abnormality score is the daily weekly or
monthly consumption estimation on a daily increment. This
relies on regressions on the oil levels versus flight time
taking into account the oil fills. Referring to table 1, this is
the basic PHM mode of “#4 Health assessment – Score
global abnormality”. Unlike ESC, for EOC, this item has no
learning mode.
3. P(NO DEGRADATION|DETECTION)
3.1. Definition
As seen previously, the PHM function section considered
for maturation in § 2.1 :
Score global abnormality*
Adjust the abnormality decision thresholds**
Decide of abnormality detection.
According to six sigma methodology, this needs to be
assessed and quantified. This is addressed through two Key
Performance Indicators Critical To Quality, Critical To
Business, KPI CTQ, CTB.
In commercial aeronautics, the major KPI CTQ CTB for
abnormality detection is an extension of the so called “No
Fault Found” ratio, NFF. The original NFF ratio refers to
failure detections which are false. The extended NFF ratio,
considered in PHM, refers to degradation detections which
are false. The degradations considered in PHM are failure
precursors. The NFF ratio is defined as P(No degradation|
Detection).
The line maintenance wishes to avoid “No fault founds”.
For instance, a false detection of fuel metering degradation
may lead to hydro mechanical unit replacement. This is
eight hours manpower. Therefore, NFF ratios should not
exceed 5% at line maintenance stage. High NFF ratios
would kill PHM.
3.2. Counterpart
A second KPI CTQ CTB is the well known Probability Of
Detection, POD. The POD is defined as P(Detection
|Degradation).
For line maintenance the POD should be as high as possible
under the constraint of low NFF ratio. For operations
management, the abnormality detection should occur as
soon as possible. For operations, NFF ratio is not as critical
as for line maintenance.
The popular Probability of False Alarm, PFA, P(Detection
|No degradation), is linked to the two KPI CTQ CTB by the
following relation:
(2)
With the type of decision considered, based on threshold
trespassing, P(Detection| No degradation) is the probability
of the global abnormality score with no degradation being
higher than the abnormality decision threshold (Figure 3).
Figure 3. Diagram of PFA and POD for a decision based on
threshold trespassing
For a typical P(Degradation) of 10-6
or 10-7
per decision, an
expected NFF rate, P(No degradation| Detection), of 5%
First European Conference of the Prognostics and Health Management Society, 2012
252
European Conference of Prognostics and Health Management Society 2012
4
and a POD, P(Detection| Degradation) of 90%, the PFA,
P(Detection| No degradation), should be 5.10-8
or 5.10-9
(Formula 2).
3.3. Estimation
The estimation of POD, P(Detection |Degradation), can be
done by counting the global abnormality threshold
trespassing when each different kind of degradation is
simulated.
The degradations are simulated rather than observed. The
premise degradations typically occur with a probability of
10-6 or 10E-7 per engine flight. It would be necessary to
cumulate more than 27.105 or 27 million flights to observe
this event at least thirty times with a probability of 90 %.
The simulations are based on transformations of the
degradation indicators values with no degradation. Such
transformations are characterized by
The degradation considered
The degradation intensity.
Strong intensity corresponds to ultimate degradation level
just before failure. This concerns line maintenance. At this
level P( No degradation | Detection) should be less than
5%. Weak or mean intensity correspond to initiation of the
degradation. This concerns operations. At this level
P(Detection | Degradation) should be favored even though
P( No degradation | Detection) reaches up to 50%.
In ESC, simulations of degradations related to starter air
supply were learnt with a phase 1 simulator based on
torques balance. Simulations of degradations related to fuel
metering were learnt on start tests records including fuel
metering biases.
In EOC, The over consumptions are simulated by drifts in
mean of the consumption estimations.
The estimation of the NFF ratio may be done through the
following formula:
(3)
where
(4)
P(Degradation) may be known through FMEA or field
experience. P(No degradation) = 1-P(Degradation) is close
to 1.
As seen previously, the order of magnitude of P(Detection|
No degradation) should be typically 5.10-8
or 5.10-9
As seen previously, P(Detection| No degradation) is the
probability of the global abnormality score with no
degradation being higher than the decision threshold (Figure
3). The estimation of such extreme level of probability
needs some parametric adjustment of the distribution of the
global abnormality score with no degradation. This requires
modeling correctly the distribution tail of the global
abnormality score with no degradation. It appears that the
adjusted Gamma and Normal distributions do not fit well
the observed distribution of the global abnormality score.
Conversely, according to figure 3, the multi parametric
adjustment obtained with Parzen estimator fits well the
observed distribution (Hmad O., Masse J.-R., Grall E.,
Beauseroy P. Mathevet A., 2011, Silverman, B. W., 1991).
Figure 4. Observed and adjusted cumulative distribution
function of ESC global abnormality score with no
degradation
4. ABNORMALITY DECISION THRESHOLDS ADJUSTMENT
4.1. Methodology
The first improvement proposed to reach an acceptable level
of P( No degradation |Detection) is to adjust the
abnormality decision threshold on the global abnormality
score with no degradation. As seen previously, P(Detection|
No degradation) is the probability of the global abnormality
score with no degradation being higher than the decision
threshold. Conversely, if the expected value of P(Detection|
No degradation) is known, the adjustment of decision
threshold may take advantage of the accurate Parzen fit. As
a first guess of P(Detection| No degradation), formula 2
may be used with a prior assumption of P(Detection|
Degradation) being close to 100%. In a second iteration
with the prior threshold, a more realistic estimation may be
done for P(Detection| Degradation) (Hmad O., Massé J -R.,
Grall-Maes E., Beauseroy P., Boulet X., 2012).
4.2. Application to ESC
This methodology is applied to ESC. A global abnormality
score distribution is observed on starts with no degradations.
First European Conference of the Prognostics and Health Management Society, 2012
253
European Conference of Prognostics and Health Management Society 2012
5
Figure 5. Impact of the fit quality on decision threshold
Figure 5 shows the need to check the distribution fits.
Figure 6 shows the initial performances of ESC with the
Parzen threshold adjustment.
Figure 6. Prior abnormality decision threshold and global
abnormality score distributions with three starter air valve
degradation intensities
Only 20% of the strong degradations are detected. This is
not acceptable for line maintenance. None of the weak or
mean degradations are detected. This is not acceptable for
operations. The performances are improved with a moving
average on the global abnormality score (Figure 6).
Figure 7. Improvement of the performances with global
abnormality score moving average on five consecutive
flights
The performances become acceptable for line maintenance
but still not for operations. The performances are again
improved with regression relations learnt specifically on
each engine (Figure 7). This improves the accuracy of the
indicator predictions.
Figure 8. Improvement of the performances with regression
relations learnt on each specific engine and moving average
of the global abnormality score
The performances become now acceptable for both line
maintenance and operations.
4.3. Application to EOC
The methodology of threshold adjustment is now applied to
Engine Oil Consumption.
Figure 9. Prior abnormality decision threshold and daily
consumption distributions with two over consumption levels
For this PHM function, almost all mean and strong over
consumptions are detected.
5. CONCLUSION
The PHM sub function considered is abnormality detection
based on threshold trespassing by a global abnormality
score. For such function the No fault found ratio, P(No
degradation| Detection) is relevant for line maintenance.
The estimation of this performance indicator supposes to fit
accurately the distribution of the global abnormality score
with no degradation.
To reach acceptable probabilities of detection at the
specified NFF ratio three improvements are needed for
Engine Start Capability PHM function:
Abnormality decision threshold adjusted using extreme
value quintiles on the global abnormality score
distribution
Moving average of the global abnormality score
Regression relations learnt specifically on each engine.
The first improvement is a novelty. It is successfully applied
to both use cases considered. It is generic to all airborne
system PHM functions based on abnormality scores. It is
First European Conference of the Prognostics and Health Management Society, 2012
254
European Conference of Prognostics and Health Management Society 2012
6
now being extended to other abnormality decision functions
such as “k trespassings among n“ and Wald likelihood
ratio.
REFERENCES
Ausloos A., Grall E., Beauseroy P. Masse J.R.. (2009).
"Estimation of monitoring indicators using regression
methods - Application to turbofan starting phase".
ESREL conference
Demaison F., Massot G., Massé J –R., Flandrois X., Hmad
O., Ricordeau J (2010). "Méthode de suivi de la
consommation d’huile dans un système de lubrification
de turbomachine" Patent # 1H105790 1093FR
Deming, W. E. “Some Theory of Sampling” Dover
Publications 1966.
Forrest W. B. “Implementing Six Sigma Smarter Solutions
Using Statistical Methods”. Wiley – Interscience
Publications 2008.
Hmad O., Massé J -R., Grall-Maes E., Beauseroy P., Boulet
X. (2012) "Procédé de réglage de seuil de décision"
Patent
Hmad O., Masse J.R., Grall E., Beauseroy P., Mathevet A.
(2011). “A comparison of distribution estimators used
to determine a degradation decision threshold for very
low first order error”. ESREL conference, September
18-22, Troyes.
Lacaille J. (2010). "Identification de défaillances dans un
moteur d’aéronef". Patent # FR2939924 A1
Lacaille J. (2010). "Standardisation des données pour la
surveillance d’un moteur d’aéronef". Patent #
FR2939928 A1
MIMOSA. (1998). "Open Systems Architecture for
Condition-Based Maintenance", OSA-CBM v3.1
standard.
Mouton P., Ausloos A., Massé J -R., Aurousseau C. A.
Flandrois X. (2010). "Method for monitoring the health
status of devices that affect the starting capability of a
jet engine” Patent # WO 2010/092080 A1
Silverman, B. W. “Density Estimation for Statistics and
Data Analysis”. Chapman and Hall: London 1986.
Wikipedia (2012)
http://fr.wikipedia.org/wiki/Technology_Readiness_Le
vel
BIOGRAPHIES
Jean-Remi Masse (Paris, 1952) – Phd in statistics (1977),
Rennes University, has practiced and taught statistics in
several industries and universities. He is presently senior
expert in systems dependability engineering and PHM for
Safran Group.
Ouadie Hmad (Montereau Fault Yonne, 1986) is presently
a Phd student for Safran Engineering Services with the
Troyes University of Technology (UTT) on performance
assessment of PHM algorithms.
Xavier Boulet (Paris, 1981) – 5 years degree in System
Engineering (2005), Evry University, has led several Test
Bench development projects for Safran. He is presently the
project manager of system PHM algorithms development
for Safran Snecma
First European Conference of the Prognostics and Health Management Society, 2012
255
Design for Availability – Flexible System Evaluation with a Model Library of Generic RAMST Blocks
Dipl.-Ing. Dieter Fasol1 and Dr.-Ing. Burkhard Münker2
1LFK-Lenkflugkörpersysteme GmbH, Hagenauer Forst 27, 86529 Schrobenhausen, Germany [email protected]
2icomod – münker consulting, Olper Straße 53, 57258 Freudenberg, Germany [email protected]
ABSTRACT
Tailoring a complex system to meet given availability requirements is a challenging task in the design process. Besides compiling the applicable mtbf and mttr figures of the required components, the specific set of algebraic rules has to be identified and applied to compute the overall pre-dicted availability of the designated system functions for the individual architecture and for individual usage profiles.
While on the other hand model based methods have mean-while become established in the system development pro-cess, such time- and effort-saving simulation-based ap-proaches are by far not so common in the field of RAMST-analyses.
This contribution reports about an approach of amending a reusable library of functional component models - originally designed to explore the effect of assumed failures in com-plex networks by simulation - and applying it to compute the availability of a generic launcher system. Here the design engineer is faced with the complex task to find an architecture to guarantee a specified availability of the firing function with the given resource items onboard.
Developed within the tool environment RODON, the developed prototypical library enables a quick evaluation of structural design alternatives of the selected launcher application. On top it supports the full range of RAMST-analyses - like computation of cause-effect-relationships for FMEA, automatic drawing of FTAs for hazards of loosing designated system functions or systematic evaluation of the Diagnostic Coverage resp. potentially monitor or reconfiguration strat1egies - based on the same single model. Further generalization activities are going on.
1 Dieter Fasol, Burkhard Münker: This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Since the model is composed of generic qualitative building blocks, it took much less time to develop than full quantitative or physical model descriptions usually need. Although this qualitative representation implies certain limitations, as the authors experienced this is compensated by the advantage of an easier understanding by the modeler and quick adaption and to new principal architectures, while still providing sufficient system insight and result.
Driven by requirements from the industrial application, ideas for further library extension are being discussed to support also questions regarding repair procedures and time, resource allocation or cover even economical aspects.
The paper sections focus on the description of the launcher design task, a short introduction to the applied tool, the library development and application and finally the results and outlook.
BIOGRAPHIES
Dieter Fasol, born in Vienna May 3rd 1960 studied Mechanical Engineering at the Ruhr University of Bochum. He started his professional career at Messerschmitt-Bölkow-Blohm GmbH in Ottobrunn near Munich. His fields of work were designing Flight State Control and Guidance Algorithms for various guided missile systems as well as developing simulation environments to serve as a design and test bed for algorithm development. Since 2007 he is working in the field of safety engineering at LFK-Lenkflugkörpersysteme GmbH and started to apply the tool RODON for model based safety analysis.
Burkhard Münker, born Dec 21st 1965, studied Mechani-cal Engineering at the University of Siegen, Germany, where he also started his scientific career with a focus in simulation and fault analyses. Continuing at Technical University of Berlin, Germany, he developed a tool for automated generation of state space models and filters for early detection of hazardous situations in chemical reaction
First European Conference of the Prognostics and Health Management Society, 2012
256
European Conference of Prognostics and Health Management Society 2012
2
systems and got his doctoral degree in engineering in 2001. For many years he has been working as senior consultant and project manager for the vendor of the model based reasoning tool RODON, developing and applying new ad-vanced approaches to the full range of diagnostic and RAMS activities for industrial and academic customers. Since 2010 working as independent technology consultant
and analyst he is still interested in tasks to support classical RAMST-analyses by advanced failure mode modeling tech-niques and esp. to adopt model based ideas to non-technical applications. He is also a lecturer for the topics Physical System Modeling and Model based Safety Assessment at University of Siegen. See his LinkedIn-profile for details.
First European Conference of the Prognostics and Health Management Society, 2012
257
Knowledge-Based System to Support Plug Load Management
Jonny Carlos da Silva1 and Scott Poll
2
1Mechanical Engineering Department,UFSC, Florianopolis, SC, 88040-900 Brazil
2NASA Ames Research Center, Moffett Field, CA, 94035 USA
ABSTRACT
Electrical plug loads comprise an increasingly larger share
of building energy consumption as improvements have been
made to Heating, Ventilation, and Air Conditioning
(HVAC) and lighting systems. It is anticipated that plug
loads will account for a significant portion of the energy
consumption of Sustainability Base, a recently constructed
high-performance office building at NASA Ames Research
Center. Consequently, monitoring plug loads will be critical
to achieve energy efficient operations. In this paper we
describe the development of a knowledge-based system to
analyze data collected from a plug load management system
that allows for metering and control of individual loads.
Since Sustainability Base was not yet occupied at the time
of this investigation, the study was conducted in another
building on the Ames campus to prototype the system. The
paper focuses on the knowledge engineering and
verification of a modular software system that promotes
efficient use of office building plug loads. The knowledge-
based system generates summary usage reports and alerts
building personnel of malfunctioning equipment and
unexpected plug load consumption. The system is planned
to be applied to Sustainability Base and is expected to
identify malfunctioning loads and reduce building energy
consumption.
1. INTRODUCTION
Lighting and HVAC loads have typically been the top
contributors to building energy consumption. However, as
technology advances have made these systems more
efficient, plug loads have become a relatively larger
contributor to energy usage. For example, in a typical
California office building lights consume around 40% of
total energy, HVAC 25% and plug loads 15% (Kaneda,
Jacobson & Rumsey, 2010; Moorefield, Frazer & Bendt,
2011). These proportions change in a high-performance
building, where unregulated plug loads can correspond to
more than 50% of total energy consumption (Lobato, Pless,
Sheppy & Torcellini, 2011). With the decreasing trend in
lighting and HVAC energy consumption and with more
dependence on computer and electronic equipment, plug
and process loads are taking up an increasingly larger slice
of the building energy use pie.
In terms of plug load energy consumption, it has been found
that motivated users are key to saving energy (Kaneda et al.,
2010). Employees who make use of built-in power saving
functionality and turn off devices when not in use can
significantly reduce energy waste, particularly during non-
business hours. In other words, many of the barriers to
reducing plug load energy use are behavioral, not technical.
As part of a NASA program to replace outdated and
inefficient buildings, NASA Ames Research Center recently
completed construction of Sustainability Base, a 50,000 sq.
ft. office building designed to exceed the Leadership in
Energy and Environmental Design (LEED) Platinum rating.
Beyond providing an inviting workspace for employees,
Sustainability Base has the following objectives:
1. To be a living, evolving research laboratory and
showcase facility for sustainable building research.
2. To provide a mechanism for the demonstration and
transfer of NASA aerospace technologies to the
building industry.
3. To be an experimental research facility relevant to
NASA’s interest in developing human habitats on Mars
and in space.
4. To facilitate collaboration by involving inter-
governmental, academic, nonprofit, and industry
partners in research on next generation sustainable
building technologies and concepts.
5. To reinforce NASA’s position on and support of the
Executive Order on Federal Leadership in
Environmental, Energy, and Economic Performance.
_____________________
Silva et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits
unrestricted use, distribution, and reproduction in any medium, provided
the original author and source are credited.
First European Conference of the Prognostics and Health Management Society, 2012
258
European Conference of Prognostics and Health Management Society 2012
2
In addition to investigations of technologies such as
greywater recycling, data mining, prognostics,
computational fluid dynamics, fuel cells, and intelligent
control, NASA Ames will also examine the influence of
plug load management. Since Sustainability Base was not
yet occupied at the time of this study, a testbed was set up in
another building on campus to perform a preliminary plug
load management assessment (Poll & Teubert, 2012).
We wish to detect irregular plug load usage, malfunctioning
devices, and also whether the plug load management system
itself is performing as expected. In this paper, we
demonstrate the development of an expert system to analyze
data acquired from plug loads and to call attention to
potential issues. The main contribution is the development
of a modular, extensible knowledge-based system that can
be easily adapted to Sustainability Base or other buildings
that use plug load management.
This paper has the following structure: Section 2 discusses
related work on intelligent systems applied to sustainable
buildings. Section 3 describes the pilot study testbed,
including test environment, plug load management devices
and data collection. Section 4 describes the expert system
developed to analyze the data generated by the monitoring
system, concentrating on knowledge representation
techniques. Section 5 presents the results of applying the
expert system to the plug load data. Section 6 concludes
with some lessons learned and next steps to be applied to
Sustainability Base.
2. INTELLIGENT SYSTEMS FOR SUSTAINABLE BUILDINGS
Knowledge-based systems have been applied to Building
Energy Management Systems (BEMS), which play an
important role in occupant comfort and energy
consumption. In this area, Doukas, Patlitzianas, Iatropoulos
and Psarras (2007) describe an intelligent decision support
system using rule sets based on a typical building energy
management system. The knowledge base addresses the
following categories: internal comfort conditions, building
energy efficiency, and decision support. The decision
support module has the following functions: Interaction
with the sensors for the diagnosis of the building’s state;
incorporation of expert and intelligent system techniques in
order to select the appropriate interventions; communication
with the building’s controllers for the application of the
decision. The system enables central management of energy
consumption in buildings by translating the building energy
knowledge into several rules and finally into electronic
commands to actuator devices. The paper describes the
adopted methodology to develop the system using expert
knowledge for building energy management, the system
architecture, a summary of its rules and an appraisal of its
pilot application to a building. One of the main project
conclusions was that expert knowledge has significant
potential for improving building energy management, since
it provides the ability to modulate, with the help of the rules,
intelligent interventions.
As presented above, heating and cooling requirements play
a vital role in building energy demands, therefore the
definition of such requirements is essential during the
building design process. In this context, Ekici and Aksoy
(2011) introduce an Adaptive Neuro-Fuzzy Inference
System (ANFIS) to predict heating and cooling energy
needs. The inputs to the inference include physical
environmental parameters such as outdoor temperatures,
solar radiation and wind speed and direction in addition to
design parameters such as building form factor,
transparency ratio, insulation thickness, and orientation. The
performance of ANFIS was benchmarked with the results of
conventional calculation methods of building energy
requirements; the ANFIS models yielded a successful
prediction rate of 96.5% for heating and 83.8% for cooling
energy requirements.
Kwok, Yuen and Lee (2011) present an intelligent approach
to assess the effect of building occupancy on cooling load.
The reference presents a neural network that considered
external (outdoor temperature, relative humidity, rainfall,
wind speed, bright sunshine duration and global solar
radiation) and internal factors (occupancy area and
occupancy rate) as inputs; the total cooling load is the model
output. The occupancy rate is derived from the total energy
of primary air units, whose output of fresh air depends on
the measured CO2 level. When the number of occupants
increases, the CO2 concentration level increases and leads
to an increase in fresh air supply rate. The study includes a
sensitivity analysis considering three variations on the input
to the neural network: only external factors, inclusion of
occupancy area and addition of occupancy rate. The analysis
Equipment No. Equipment No.
Desktop 6 Calculator 1
Laptop 3 Storage drive 1
Printer 7 Battery charger 1
Phone 2 Vending machines 2
Speaker 3 Space heater 1
Scanner 3 External drive 1
Monitor 7 Coffee maker 1
Hub 2 Refrigerator 1
Copier 1 Bridge 1
Shredder 3 Microwave 1
Lamp 2 TOTAL 50
Table 1. List of equipment monitored
First European Conference of the Prognostics and Health Management Society, 2012
259
European Conference of Prognostics and Health Management Society 2012
3
demonstrates the importance of occupancy data in the
building cooling load prediction. From these few examples,
it is clear that there is a need for applications of intelligent
systems for sustainable buildings.
3. TESTBED FOR PLUG LOAD MANAGEMENT
In preparation for deploying a plug load management
system to Sustainability Base, a pilot study was conducted
in another office building on the NASA Ames campus (Poll
& Teubert, 2012). The plug load management system
included 15 power strips, with 4 channels (receptacles) per
strip. Each channel is metered and can also be commanded
on or off. Power strips wirelessly transmitted data to and
commands from a cloud-based data service via bridges
connected to the building local area network. Minimum,
mean, and maximum power draws for one minute intervals
were recorded to a database. In order to collect a
representative set of data, the power strips were located in
different locations, including offices, a copy room, and a
break room. Table 1 lists the types and number of
equipment monitored.
Channels had various power consumption profiles and
operating modes (e.g., standby, idle, active). Both the
number of channels and power consumption characteristics
will change in the future deployment to Sustainability Base,
requiring that the knowledge-based system be easily
adaptable.
Power consumption data were collected over a period of
several weeks to establish a usage baseline. Then, schedule-
based control was used to power off and on groups of
devices at different times according to occupants’ work
schedules. In addition to employing time-based rules,
changes were made to the energy saver settings of certain
devices (e.g., time to standby mode, screen saver behavior).
4. EXPERT SYSTEM PROTOTYPE FOR PLUG LOAD
ANALYSIS
One of the main objectives of the plug load testbed was to
gain practical experience that could be transferred to
Sustainability Base, which was in the final stages of
commissioning at the time of this study. Consequently, one
of the first decisions made in developing the expert system
prototype was to create a knowledge-base that could be
easily adapted to the future setup, which will have a
different set of plug loads compared to the testbed.
The expert system prototype was developed in CLIPS
(Giarratano & Riley, 1994) using a combination of rules,
semantic network and object-oriented modeling.
Figure 1 presents the expert system prototype UML activity
diagram. The knowledge-base is composed of two parts: 1)
CLIPS Instances (setup dependent); 2) Rules and Methods
(setup independent), as discussed later. For graphical output,
a JavaScript library was used (Dygraphs, 2011).
The choice of CLIPS as developmental framework was
guided by the following factors:
- CLIPS Object-Oriented Language (COOL) module allows
to take full advantage of object-oriented modeling;
Figure 1. UML activity diagram of expert system prototype
First European Conference of the Prognostics and Health Management Society, 2012
260
European Conference of Prognostics and Health Management Society 2012
4
- The representation paradigm was chosen based on
previous experiences in developing expert systems for
different engineering domains, including hydraulic system
(Silva & Back, 2000), cogeneration power plant design
(Matelli, Bazzo & Silva, 2009; Matelli, Bazzo & Silva,
2011), and natural gas transportation modeling (Starr &
Silva, 2005);
- The combination of object-oriented modeling, semantic
network, and rules in an incremental approach allows
modularity, expandability, and robustness;
- The framework allows for a rapid prototyping by the
knowledge engineer, which was a benefit in this case, given
previous experience and time limitations.
The prototype was designed to process raw data and
generate summary reports and graphs with useful
information for either building operators or occupants.
Table 2 lists the prototype elements (presented in Figure 1)
and their rationale. The primary functions can be
summarized as:
1. Alert loss of communication
2. Alert failure of schedule-based on/off rules
3. Alert abnormal power consumption
4. Alert possible channel change
5. Present power mode transitions
6. Present percentage of time in different power modes
7. Present overall energy consumption per day
Since we did not have much a priori information regarding
the different types of loads, and due to the fact that most
attributes are shared by all loads, we chose to generate all
load instances from the main class. However, as the system
evolves and more detailed information is obtained regarding
differences among loads, it is possible to modify the class
structure – adding sub-classes such as desktop, laptop,
printer, etc., and redefining the current instances according
to these new sub-classes. Such expansion would greatly
increase the ability to define specific methods without
requiring a considerable change in system code. Even with
the current structure, it is possible to treat different loads in
a specific manner. For example, the calculation of abnormal
consumption, i.e., out of a pre-defined power range, was
implemented for the photocopier. In order to do that, the
abnormal_range attribute value was defined for this
instance. Some loads required special treatment. Desktops
computers did not have schedule-based control applied
because removing power without a proper shutdown
Element Description
Define period of analysis User definable period of analysis during which all instance attributes are kept the same.
Change default on/off times Each load has time-based on/off attributes pre-defined in the instance set. These
attributes are key to identifying inconsistencies, since a load can be operating when it
shouldn't or vice-versa.
Access database Generate facts by extracting only relevant attributes from the database, such as channel,
initial time and average power.
CLIPS Instances Comprising the core of the system, instances define specific methods (e.g., calculate
time spent in each power mode) as well as attributes that describe load behavior such as
status, power, abnormal range, etc.
Check inconsistencies Set of rules and methods to accomplish the following functions:
Alert loss of communication if load doesn’t report measurement for more than 20
minutes.
Alert failure of schedule-based on/off rules: triggers a message if a load is on when it
should be off or conversely, if a load is off but should be on.
Alert abnormal power consumption: triggers a message if load is consuming power out
of any previously defined range or if the load is consuming power in a range for
longer than expected (e.g., transition to standby mode after 60 minutes).
Alert possible channel change: triggers a message if power consumption pattern
indicates that a different load may have been plugged into a channel. In this case, the
system writes a message to the end of the daily report, indicating the time when the
change was detected and which channel(s) was switched.
Generate Reports Present power mode transitions: records a message if load changes modes (e.g., on to
off, standby to idle, idle to active).
Present percentage of time in different power modes: records duty cycle information for
each channel.
Present overall energy consumption per day: record the total kilowatt-hour energy
consumption for each channel.
Calculate wasted-energy: records energy consumption of phantom loads.
Graphical reports for different fault modes.
Table 2. Knowledge base elements of expert system prototype
First European Conference of the Prognostics and Health Management Society, 2012
261
European Conference of Prognostics and Health Management Society 2012
5
sequence could result in data loss or damage to the
computer.
Additionally, turning off some printers during non-business
hours was found to use more energy than leaving the device
in standby mode because of high power consumption for the
warm-up and idle modes.
5. RESULTS
The prototype system was implemented and tested using
data gathered from the plug load management system. All
functions listed in Table 2 were tested by comparing the
daily reports created by the prototype system to the recorded
data.
Figure 2 shows outputs from one of the functions that
checks for inconsistencies. Figure 2a plots the typical
behavior, which shows that channel 14.1 was turned off
with a schedule-based rule at 10 pm on July 18. However,
on the next day, the same channel remained on after this
time. Figure 2b shows that an Alert has been triggered at
22:05 (see alert message in upper right of graph), indicating
that the schedule-based rule has failed.
The system also generates verbose report, whose snippet is
presented in Figure 3. Firstly, Figure 3 presents each time a
mode transition takes place, anomalies such as a possible
change in channels, consumption in an abnormal range, and
loss of communication. The next part presents a table with
percentages of time spent in different modes. The final note
calls attention to items which require inspection.
Although the loads were modeled as a single class, it is
possible to study distinct behaviors. For example, for
channel 5.0 there is a special rule that checks to see if the
device transitions to standby mode after 60 minutes of
inactivity. As shown in Figure 4a, the copier transitions to
standby mode (~60W) as expected. Figure 4b shows that on
the next day at 14:35 it failed to transition to standby mode
and had excessive power draw for the remainder of the day.
In terms of expansion and use in Sustainability Base, only
the template to database access and the instance set need to
be changed. Neither task requires a skilled programmer
a) Channel switches off at 10 pm b) Channel fails to switch off at 10 pm
Figure 2. Example of schedule-based rules
Report corresponding to date:20110728
6.3 @12:30 mode change on to idle power= 4.61
11.3 @12:30 mode change standby to on power= 1.3
…
@12:45 9.3 consumed power out of all ranges (phantom,
standby, idle and active). Possible change in the channel has
occurred. Check the user. Power= 88.15
…
5.0 @13:30 consuming in abnormal range:Power:265.05
…
4.0 @22:00 loss of communication
…
Channel Total kWh % on % off %other modes (*)
1.0 1.88 30.43 0.00 69.56
1.1 0.17 0.72 16.67 82.61
…
15.0 0.15 33.33 60.15 6.52
15.1 0.10 34.06 60.15 5.80
* includes phantom, standby and idle modes
Attention: it is possible channels were changed because
@ 12:45 channel 9.3 consumed power 88.15 out of its normal
ranges. Check other channels in the same node.
Attention: it is possible channels were changed because
@ 12:45 channel 9.1 consumed power 27.58 out of its normal
ranges. Check other channels in the same node
Figure 3. Example text report
First European Conference of the Prognostics and Health Management Society, 2012
262
European Conference of Prognostics and Health Management Society 2012
6
because the rules/methods do not refer to specific instances;
they are independent from the operational setup. The
knowledge base proved to be extensible as it incorporated
additional attributes as the specifications increased in
complexity. Future work will implement plug load
subclasses to include specific information allowing a
modular expansion.
6. CONCLUSION
The paper presents the development of a knowledge based
system to analyze plug loads, which are becoming
increasingly important in high-performance buildings. The
system processes data acquired from a plug load monitoring
system, triggers alerts and generates reports. The alerts call
attention to malfunctioning equipment, failure of schedule-
based rules, or changes in use pattern. The reports
summarize plug load power consumption statistics.
Providing such feedback to occupants is expected to identify
malfunctioning equipment and reduce the energy
consumption of Sustainability Base.
ACKNOWLEDGEMENT
This project was developed under Grant 4095/10-3, CAPES
Foundation, Brazil. The authors would also like to thank
UFSC- Federal Univ. of Santa Catarina (Brazil), and NASA
Ames Research Center for their support.
REFERENCES
Doukas, H., Patlitzianas, K. D., Iatropoulos, K., & Psarras,
J. (2007). Intelligent Building Energy Management
System Using Rule Sets. Building and Environment 42:
3562–3569.
Dygraphs JavaScript Visualization Library,
http://dygraphs.com/ accessed Nov, 2011.
Ekici, B. B., & Aksoy, U. T. (2011). Prediction of Building
Energy Needs in Early Stage of Design by Using
ANFIS. Expert Systems with Applications 38: 5352–
5358.
Giarratano, J., & Riley, G. (1994). Expert Systems -
Principles and Programming, Second Edition, PWS
Publishing Company.
Kaneda, D., Jacobson, B., & Rumsey, P. (2010). Plug Load
Reduction: The Next Big Hurdle for Net Zero Energy
Building Design. ACEEE Summer Study on Energy
Efficiency in Buildings.
Kwok, S.S.K., Yuen, R.K.K., & Lee, E.W.M. (2011). An
Intelligent Approach to Assessing the Effect of
Building Occupancy on Building Cooling Load
Prediction. Building and Environment 46: 1681-1690.
Lobato, C., Pless, S., Sheppy, M., & Torcellini, P. (2011).
Reducing Plug and Process Loads for a Large Scale,
Low Energy Office Building: NREL Research Support
Facility. (Tech Rep. No. NREL/CP-5500-49002).
National Renewable Energy Laboratory.
Moorefield, L., Frazer, B., & Bendt, P. (2011). Office Plug
Load Field Monitoring Report. California Energy
Commission, PIER Energy-Related Environmental
Research Program, Tech. Rep. CEC-500-2011-010.
Matelli, J. A., Bazzo, E., Silva, J. C. (2009). An Expert
System Prototype for Designing Natural Gas
Cogeneration Plants. Expert Systems with Applications,
v. 36, p. 8375-8384.
Matelli, J. A., Bazzo, E., & Silva, J. C. (2011).
Development of a Case-Based Reasoning Prototype for
Cogeneration Plant Design. Applied Energy, v. 88, p.
3030-3041, 2011.
a) Normal transition b) Failure to transition
Figure 4. Examples of transition to standby
First European Conference of the Prognostics and Health Management Society, 2012
263
European Conference of Prognostics and Health Management Society 2012
7
Poll, S., & Teubert, C. (2012). Pilot Study of a Plug Load
Management System: Preparing for Sustainability Base.
In Proceedings of 2012 IEEE Green Technologies
Conference. Institute of Electrical and Electronics
Engineers, Inc.
Silva, J. C., & Back, N. (2000). Shaping the Process of
Fluid Power System Design Applying an Expert
System. Research in Engineering Design., Vol. 12.
Starr, R. R., & Silva, J.C. (2005). Leak Detection in Gas
Pipelines- A Knowledge Based Approach. Proceedings
18th International Brazilian Congress of Mechanical
Engineering, COBEM 2005.
First European Conference of the Prognostics and Health Management Society, 2012
264
Integrated Vehicle Health Management and Unmanned Aviation
Andrew Heaton1, Ip-Shing Fan2 Craig Lawson3, and Jim McFeat4
1,2IVHM Centre, Cranfield University, Conway House, Medway Court, University Way,
Cranfield Technology Park, MK43 0FQ, UK
[email protected]@cranfield.ac.uk
3 Aerospace Engineering Department, School of Engineering, Cranfield University, MK43 0AL, UK
4BAE Systems, Warton Aerodrome, W374B, Preston, Lancashire, PR4 1AX, [email protected]
ABSTRACT
Over the past decade the use of unmanned aerial systems(UAS) has increased in military, intelligence, andsurveillance operations for dull, dirty and dangerous (DDD)missions. They have primarily been used at a time of war,and been pushed into service by programmes designedincrease the capabilities of military organisations, with littlethought of supportability or interaction with other air users.The increased use of UAS in war time and the ensuingmedia coverage has naturally led proposed use of UAS in acivilian context, being used for a wide range of non-militaryDDD missions: from land usage and crop monitoring; to themonitoring of nuclear power plants; to inspection of powerlines.
But when considering the use of UAS this civilian contextbrings up important issues, such as: How can UAS beintegrated into civil unsegregated airspace, and how willthey react to other air traffic (manned and unmanned)? Howcan UAS be shown to be safe to get general public, especialwith an increased level of autonomy? What technologies areneeded to ensure the safe use of UAS? Is using a UAS moreeconomical than using the manned equivalent? The list goeson.
Steps have already been taken to answer these issues, in theUnited Kingdom the Civil Aviation Authority (CAA) haveproduced guidance (CAP 722) for manufactures andoperators of UAS, to allow them to build and operate UASwhilst developing a framework to fully integrate them into
the civil airspace. In addition the CAA is working closelywith industry-led consortium ASTRAEA (AutonomousSystems Technology Related Airborne Evaluation &Assessment) to help solve the issues of using UAS in civilairspace, and with the United States Federal AviationAdministration (FAA) having been set a 2015 deadline forfull integration, hastening the need for solutions to be found.
The poster will take a holistic systems engineering view ofthe current situation in unmanned aviation and whereIntegrated Vehicle Health Management (IVHM) might beused (in whole or in part) to solve some of the issuesmentioned above.
It will present reasons why you may wish to include IVHMin a UAS (e.g. cost, size, safety), the potential benefits (e.g.increased availability, reduced maintenance costs) andpitfalls (e.g. false positives) of implementing an IVHM on aUAS. It will also map how the IVHM (system of interest)interacts with the rest of the UAS (wider system of interest),how it relates to the issues the aviation industry have withUAS (environment) and the general public (widerenvironment).
_____________________Andrew Heaton et al. This is an open-access article distributed under theterms of the Creative Commons Attribution 3.0 United States License,which permits unrestricted use, distribution, and reproduction in anymedium, provided the original author and source are credited.
First European Conference of the Prognostics and Health Management Society, 2012
265
First European Conference of the Prognostics and Health Management Society, 2012
266
Author Index
Adhikari, Partha Pratim, 172Anger, ChristophSchrader, Robert, 202Archimède, Bernard, 1
Baraldi, Piero, 90Biswas, Gautam, 156Bolognese, Danilo, 127Boulet, Xavier, 250Bregon, Anibal, 214Brown, Douglas, 183Buderath, Matthias, 59, 172, 225
Camci, F., 98, 148Celaya, José R., 69, 156Charbonnaud, Philippe, 1Compare, Michele, 90Corbetta, M., 104Cross, Joshua, 34
da Silva, Jonny Carlos, 258DaghighiAsli, Afshin, 239Darr, Duane, 183Desforges, Xavier, 1Diévart, Mickaël, 1
Eker, O.F., 148Esperon-Miguez, Manuel, 192
Fan, Ip-Shing, 265Fasana, A., 51Fasol, Dieter, 256Ferrara, Davide, 127Feuillard, Vincent, 42Finda, Jindrich, 165
Garibaldi, L., 51Giglio, M., 104Goebel, Kai, 69, 156Gola, Giulio, 141Gutsche, Katja, 244
Hédl, Radek, 165Hafner, Michaël, 17Haines, Conor, 59Hayati, Leila, 239Heaton, Andrew, 265Hmad, Ouadie, 250Hulsund, John Einar, 141
Jacazio, Giovanni, 127Jennions, Ian K., 148, 192
John, Philip, 192
Klingauf, Uwe, 202Kulkarni, Chetan S., 156Kunze, Ulrich, 25Kwapisz, David, 17
Löhr, Andreas, 59Lamoureux, Benjamin, 10Laskowski, Bernard, 183Lauffer, Jim, 80Lawson, Craig, 265Lorton, Ariane, 42Lucas, Andrew, 34
Münker, Burkhard, 256Manes, A., 104Marchesiello, S., 51Massé, Jean-Rémi, 10, 250McFeat, Jim, 265Mechbal, Nazih, 10Medjaher, K., 98Merino, Alejandro, 214Mikat, Heiko, 225Morse, Jefferey, 183
Nystad, Bent Helge, 141
Ompusunggu, Agusmian Partogi, 114
Pirra, M., 51Poll, Scott, 258Pulido, Belarmino, 214
Raab, Stefan, 25Rajamani, Ravi, 17Rezaie, Vahid, 239Roychoudhury, Indranil, 69
Saha, Bhaskar, 69Saha, Sankalita, 69Sas, Paul, 114Sauco, Sergio, 90Saxena, Abhinav, 69Sbarufatti, C., 104Sen Gupta, Jayant, 42Siddiolo, Antonino Marco, 225Sorli, Massimo, 127Stecki, Chris, 34Stecki, Jacek, 34
Trinquier, Christian, 42
Van Brusse, Hendrik, 114
Vandenplas, Steve, 114
Vechart, Andrew, 165
Zamarreño, Jesus Maria, 214Zerhouni, N., 98Zio, Enrico, 90