-
ul
tle720ley,a 94rnia
Keywords:Bayesian networkHVAC systems
gy ue, vequepme
tics. The focus of this paper is on developing diagnostic
algorithms for air handling units that can address
signic
to have occurred even though it may result in an increase in
energyusage. As long as the control system satises the set-points,
thebuilding operators tend to assume that the system is working
ef-ciently in a non-faulty condition.
The topic of fault detection and diagnosis in air handling
unitshas been an active area of research and development for more
than
poses. Other factors and considerations such as control
objectives,nancial constraints, and practical limitations are also
involved. Asa result, we are confronted with situations in which
the perfor-mance of two or more components is monitored through
onlyone sensor (or one set of sensors). A well-known example is
reli-ance on supply air temperature to analyze the functionality
of
Corresponding author.E-mail addresses: [email protected] (M.
Naja), [email protected] (D.M.
Auslander), [email protected] (P.L. Bartlett),
[email protected] (P. Haves),
1 If the model is a detailed rst-principle model, the a priori
knowledge comprisesmainly model parameter values and their
variations. If the model is an empiricalmodel, the a priori
knowledge is usually high-quality training data for systembehavior
in different modes.
Applied Energy 96 (2012) 347358
Contents lists available at SciVerse ScienceDirect
Applied Energy
[email protected] (M.D. Sohn).energy consumption and have a
major impact on comfort condi-tions and building maintenance cost.
An air handling units energyusage can vary from the original design
as components fail or fault:dampers leak or fail to open/close,
valves get stuck, and so on. Suchproblems do not necessarily result
in occupant complaints, as thecascade structure of the control
system would try to neutralizethe fault effect through re-adjusting
other parameters and/orchanging the component loads. For instance,
the effect of a dam-per-leakage fault may be covered by
re-adjusting the position ofthe hot or cold water valves. The fault
may not even be recognized
dling unit diagnostics.The fact is that the principles of HVAC
systems, particularly for
air handling units, are known well enough to create suitable
modelstructures; however, the accuracy of such models can be
improvedonly up to a certain level; beyond that, excessive effort
is requiredto obtain high-quality a priori knowledge,1 which
negatively affectsmodel scalability. This limits the applicability
of diagnostic strategiesthat rely on accurate or detailed
models.
On the other hand, the architecture of sensor networks in
airhandling units is not necessarily designed solely for diagnostic
pur-Air-handling unitEnergy managementFault detection and
diagnosisMachine learning
1. Introduction
1.1. Overview
Air handling units account for a0306-2619/$ - see front matter
2012 Elsevier Ltd. Adoi:10.1016/j.apenergy.2012.02.049such
constraints more effectively by systematically employing
machine-learning techniques. The pro-posed algorithms are based on
analyzing the observed behavior of the system and comparing it with
aset of behavioral patterns generated based on various faulty
conditions. We show how such a pattern-matching problem can be
formulated as an estimation of the posterior distribution of a
Bayesian proba-bilistic model. We demonstrate the effectiveness of
the approach by detecting faults in commercial build-ing air
handling units.
2012 Elsevier Ltd. All rights reserved.
ant portion of building
two decades. However, in spite of the progress and effort
made,there is still a lack of reliable, affordable, and scalable
solutionsto locate and manage faults in these systems; modeling
limitations,measurement constraints, and the complexity of
concurrent faultsare among the main challenges for scalable
solutions for air han-Available online 27 March 2012able, scalable,
and affordable diagnostic solutions for such systems. Modeling
limitations, measurementconstraints, and the complexity of
concurrent faults are the main challenges in air handling unit
diagnos-Application of machine learning in the fa
Massieh Naja a,, David M. Auslander a, Peter L. BaraDepartment
of Mechanical Engineering, University of California, Berkeley,
California 94bComputer Science Division and Department of
Statistics, University of California, BerkecCommercial Building
Systems Group, Lawrence Berkeley National Laboratory,
CalifornidAirow and Pollutant Transport Group, Lawrence Berkeley
National Laboratory, Califo
a r t i c l e i n f o
Article history:Received 16 August 2011Received in revised form
9 January 2012Accepted 20 February 2012
a b s t r a c t
An air handling units enerers leak or fail to open/clospant
complaints and, consin the research and develo
journal homepage: www.ell rights reserved.t diagnostics of air
handling units
tt b, Philip Haves c, Michael D. Sohn d
, United StatesCalifornia 94720, United States720, United
States94720, United States
sage can vary from the original design as components fail or
fault damp-alves get stuck, and so on. Such problems do not
necessarily result in occu-ntly, are not even recognized to have
occurred. In spite of recent progressnt of diagnostic solutions for
air handling units, there is still a lack of reli-
vier .com/locate /apenergy
-
Energy 96 (2012) 347358the mixing box and heating and cooling
coils. As will be shown la-ter, in such scenarios, when the sensor
output is contaminated, itcould be due to the malfunction of any
involved components,and it is not necessarily straightforward to
locate the malfunction-ing one.
The complexity of modeling limitations and measurement
con-straints in air handling unit diagnostics becomes even more
severewhen the possibility of concurrent faults is taken into
account. Asingle-fault assumption would relieve the diagnostic
complexity,but in reality, two or more faults may occur at the same
time with-in one component or across different ones. The effect of
concurrentfaults is not necessarily a linear interpolation of each
individualones.
Nomenclature
HVAC heating, ventilation, and air conditioningNo fault no-fault
conditionReverse reverse actuator faultOAD leak outside air damper
leakage faultRAD leak return air damper leakage faultStuck stuck
damper faultFouling fouling faultVLV stuck valve-stuck faultSAT
supply air temperature (F)OAT outside air temperature (F)RAT return
air temperature (F)MAT mixed air temperature (F)DMP outside air
damper position (F)T_air_in temperature of entering air
(F)T_water_in temperature of entering water (F)T_air_out
temperature of outgoing air (F)NTU number of transfer unit (NTU)
methodCFM cubic feet per minute, measurement of air volume ow
rate
348 M. Naja et al. / AppliedOne approach to relieve the
diagnostic complexity due to mod-eling limitations and measurement
constraints is active diagnos-tics. In active-mode diagnostics, the
diagnostic mechanismactively controls or manipulates the system
inputs (e.g. damperpositions, valves, etc.) to detect and isolate
faults. Usually, inputsare changed based on predened (or adaptive)
test sequences toexplore various operating conditions. The tests
can be structuredto explore operating points with less uncertainty
or error, or inthe case of one sensor being affected by several
components func-tionality, put neighboring components into neutral
states to haveone component at a time affecting the measured
variable. How-ever, active-mode diagnostics require isolation of
the system fromnormal operation, an option that may not be
feasible.
Conversely, in passive-mode diagnostics, there is no control
onthe inputs. In this approach, the system is in a closed-loop
opera-tion manipulated by the control system based on the set-point
er-ror and so on. This is a more complicated scenario, as there is
nocapability to change or manipulate the inputs to follow a test
pro-cedure or sequence. The diagnostic mechanism needs to
somehowmake the best use of available data (measurements) from
dailyoperation.
The focus of this paper is on developing passive-mode
diagnos-tic algorithms for air handling units that can
systematically ad-dress the above constraints in a passive mode. We
believe thatan ideal diagnostic solution should not only be
reliable in detectingand isolating abnormal behaviors but also have
systematic solu-tions for constraints and challenges related to
scalability andaffordability. Our proposed diagnostic algorithm is
based onanalyzing observed behavioral patterns and comparing them
witha set of predened patterns generated based on different
faultassumptions. In Section 3, we will show how such a
pattern-matching problem can be formulated as estimation of the
posteriordistribution of a Bayesian diagnostic model. We will also
showhow the proposed diagnostic framework can systematically
ad-dress modeling and measurement constraints. In Section 4,
wedemonstrate the effectiveness of the proposed algorithm using
var-ious examples.
2. Literature survey
Heating, ventilation, and air conditioning (HVAC) systems
ac-
VLV valve positionIID independent and identically distributedva
air velocity (ft/s)vw velocity of water (ft/s)Ch hot uid capacity
rateCc cold uid capacity rateTair-in temperature of incoming air
(F)Tw-in temperature of incoming water (F)a coefcient factorb
coefcient factorl mean or expected valuer2 varianceDP total
pressure rise across fanCp specic heat of air (BTU/lbF)d density
(lb/ft3)g fan combined efciencycount for more than 30% of annual
energy use in the United States[3,5,6]; however, it has become
apparent that only a small percent-age of them work efciently or in
accordance with the design in-tent [2,9]. Operational faults are
one of the main causes for theinefcient operation of HVAC systems.
Studies of existing buildingshave found that energy savings of 515%
are typically achievablesimply by xing faults and optimizing HVAC
control systems [8].
However, the current methods of detecting faults or perfor-mance
creep are labor-intensive. Typically, building operators
orengineers use intuition and various rules of thumb to identify
theproblem. In practice, the labor-intensiveness of these tasks is
suchthat they are not routinely performed and in fact may never be
per-formed. If the 515% energy savings are to be met in
practice,HVAC systems must be capable of detecting when a failure
has oc-curred, when performance is creeping and to determine the
likelyoffending hardware or operating condition. Automated systems
forfault detection are, therefore, essential if low-energy or
net-zeroenergy goals are to be met nationally.
Functionally, an air handling unit (AHU) is a device used to
con-dition and circulate the air as part of an HVAC system. It is
usually alarge metal structure containing one or two fans, a mixing
box, andheating/cooling coils2 (Fig. 1). The mixing box mixes the
air return-ing from the building with fresh outside air; the
minimum ratio ofoutside air to be re-circulated is specied by
building codes. Theheating/cooling coils heat up or cool down the
mixed air to maintainthe required supply air temperature and
humidity.
2 It may contain both or either.
-
EneM. Naja et al. / AppliedTypically, an air handling unit
contains three temperature sen-sors, the outside air temperature
(OAT), return air temperature(RAT), and supply air temperature
(SAT) sensors, along with a fanstatus indicator (Fig. 1). One of
the main challenges in monitoringair handling unit performance is
the absence of a reliable measure-ment for the mixed air
temperature (MAT), the temperature of theair coming from the mixing
box before going through the heating/cooling sections. Usually,
either there is no sensor in place to mea-sure the MAT or, even if
there is a temperature sensor, the sensorreadings are unreliable
due to incomplete upstream mixing. Thisconstraint forces us to use
the SAT sensor to evaluate mixing boxperformance. However, as shown
in Fig. 1, the SAT is also affectedby the heating/cooling coil
functionality, and distinguishing the
Fig. 1. Air hanrgy 96 (2012) 347358 349mixing box effects from
the heating/cooling coil effects is notstraightforward (as in the
case when two or more componentsare being monitored through one
sensor).
An AHU malfunctions when any number of its internal compo-nents
faults. Air handling diagnostics have been an active area
forresearch and development [26,27,33,43,7,41,24,12,14]. A variety
ofdiagnostic solutions ranging from
rst-principle-model-baseddiagnostic routines [16,32] to
empirical-model-based diagnosticapproaches [36,32,45,46,34,29,30]
and qualitative/rule-based diag-nostic solutions [25,4,19,15] have
been developed for the evalua-tion of air handling unit performance
and its components.
However, as mentioned earlier, the nature of the HVAC
industryand the fact that AHUs are usually designed and customized
for
dling unit.
-
scalability perspective. On the other hand, when an analysis
ap-proach employs simplied, more generic, models, the challenge
existence of one or more faults in the system (see for
example[39,40]. Once the closest hypothetical pattern is identied,
theassociated assumptions are concluded to be the system health
sta-
Energy 96 (2012) 347358is how to differentiate between the
inconsistencies due to modelmisspecication errors and those due to
system malfunction. Inother words, when detailed models are
replaced with more simpli-ed ones, the interpretation of model
prediction differences be-comes more challenging.
A strategic approach to address the complexity of
employingsimplied models is to change the focus of an analysis
approach in-stead on system behavioral patterns instead on error
residuals. Inother words, instead of analyzing the difference
between the sys-tem output and the model prediction at one or a few
operatingpoints, diagnostics are made by evaluating the system
behavioralpatterns over a window of operation. This lessens the
dependencyof the diagnostic algorithm on model accuracy. Such an
approachhas been employed by a number of diagnostic routines,
particu-larly qualitative and semi-quantitative diagnostic
approaches[26,27]. The key here is an algorithm (inference
mechanism) thatevaluates the observed behavior and compares it
against a set ofpredened (or even adaptive) hypotheses. Fuzzy logic
has becomea popular choice for such problems due to the inherent
exibilityembedded in fuzzy sets and fuzzy rules, which makes it a
suitablesolution for reasoning in domains with some level of
uncertainty[44,16,17,20]. For example, Haves et al. [17] proposed a
fuzzy-based diagnostic routine for the fault diagnostics of VAV air
han-dling units in which the fuzzy-based inference mechanism
com-pares the predictions of simplied models with the air
handlingunit component outputs at various operating conditions to
drawconclusions about the air handling unit health status.
However, fuzzy-based inference mechanisms have their
ownlimitations. As the problem complexity grows (due to the
systemcomplexity, a large amount of disparate sensor data, the
numberof potential faults, etc.), a large number of fuzzy sets and
fuzzyrules are required to analyze the system performance. Added
tothis is the difculty with adjusting and tuning fuzzy sets
eithermanually or through other approaches.
Another approach to managing modeling limitations are rule-based
diagnostic routines [42,10,35,1,28,37,38]. In this approach,a
priori knowledge is formulated through a set of if-then rules
cou-pled with an inference mechanism searching through the rules
todraw a diagnostic conclusion. Rule-based frameworks can be
de-signed based on expert knowledge or rst principles. Their
advan-tage is simplicity and ease of deployment; however, as
discussed inKatipamula and Brambley [26,27], as problem complexity
grows orwhen new/additional rules are added, the simplicity of the
ap-proach is lost quickly. Furthermore, sometimes the activation
ofthe rules depends on threshold(s), which may depend greatly
onmodel uncertainties, measurement errors, or other issues.
Morediscussion on this can be found in House et al. [19].
In this paper, we adopt the strategy of employing
simpliedmodels, as we believe that dependency on complex and
detailedmodels is a signicant technological barrier and cause for
industryresistance to large-scale deployment. Our approach
therefore relieson more sophisticated inference mechanisms to
interpret discrep-ancies between model predictions and the system
output.
3. Diagnostic algorithm
We think of fault diagnostics as the process of analyzing a
sys-each individual buildings limit the applicability of diagnostic
solu-tions that rely on detailed models (or models that rely on
congu-ration data that is not easily measureable or accessible)
from the
350 M. Naja et al. / Appliedtem behavioral pattern (observed
performance) and comparing itwith a set of hypothetical patterns to
nd the closest match. Eachhypothetical pattern is developed based
on the assumption of thetus. For example, in mixing box
diagnostics, if it turns out that theobserved performance is closer
to the behavioral pattern describedby the
outside-air-damper-leakage fault condition from a pool ofbehavioral
patterns associated with stuck-damper fault, reverse-actuator
fault, and so on, it is concluded that the underlying mixingbox had
an outside-air-damper-leakage fault.3
To formulate this within a mathematical framework, let us de-ne
the set of potential faults as:
F ff1; f2; f3; . . . :; fng 3:1and the measured data from the
system is dened as:
E fe1; e2; e3; . . . :; emg 3:2where e1 . . .em present vectors
of the data measured at t = 1, . . .,m.The aim is to calculate the
probability of F given E, P(F/E) posteriorprobability of F, and nd
out for which combination of f1, f2, f3, . . ., fn,P(F/E) is
maximized.
f1 . . . fn Represents the set of all possible faults in the
systems (fiis 1 when the ith fault exists and 0 when the ith fault
does notexist). For example, in the mixing box example, f1 could be
an out-side-air-damper-leakage fault, f2 could be a
return-air-damper-leakage fault, and f3 could be a reverse-actuator
fault. Therefore,F = {1, 0, 0} means that only one fault (an
outside-air-damper-leak-age fault) exists; F = {0,0,1} is related
to the case of reverse-actua-tor fault, similarly, and F = {1, 1,
0} is related to the case of twoconcurrent faults: an
outside-air-damper-leakage fault and a re-turn-air-damper-leakage
fault. The case of F = {0, 0, 0} is relatedto a no-fault
scenario.
Note that the marginal probability of an individual fault (fj)
canbe calculated by:
Pfjje1; e2; e3; . . . ; em P
f1 ...fnexcludingfj
Pf1; f2; f3; . . . :; fnje1; e2; e3; . . . ; em
3:3Now, using Bayes rule, we can compute P(F/E) as:
Pf1 . . . fnje1 . . . em Pf1 . . . fnPe1 . . . emjf1 . . . fnPf1
...fn
Pf1 . . . fnPe1 . . . emjf1 . . . fn 3:4
where P(f1 . . . fn) is the prior distribution. Different
strategies or logiccan be used to estimate the prior distributions.
They can be denedbased on statistical analysis: if there are
statistical results or quali-tative information about which faults
(or fault combinations) aremore frequent than others. Additionally,
intuitive methods can beemployed to dene the fault priors. In this
paper, we follow the phi-losophy that a single fault is more likely
to occur than two faultssimultaneously; similarly, two concurrent
faults have a higheroccurrence probability than three concurrent
faults. Therefore, sin-gle faults are assigned a higher prior than
two concurrent faults, andtwo concurrent faults would have a higher
prior than three concur-rent faults, and so on.
With an IID sampling assumption,4 Eq. (3.4) can be expanded
as:
log Pf1 . . . fnje1 . . . em log Pf1 . . . fn Pmi1
log Peijf1 . . . fn
log Pf1 ...fn
Pf1 . . . fnPe1 . . . emjf1 . . . fn
3:5
3 The mixing box functionality, model, and diagnostic algorithm
are discussed indetail in Section 4.4 Here, the IID assumption
means that, given faults f1 . . . fn, the random variablese1 . . .
em are statistically independent and identically distributed. More
on IIDsampling can be found in DasGupta [11].
-
P(ei|f1 . . . fn) is the likelihood function: the probability of
measuringei given f1 . . . fn. This comes from the system model:
assuming thatthe fault condition f1 . . . fn exists, what is the
likelihood of measuringei? We can split ei into two sets: the sets
of system inputs, Ii and sys-tem outputs, Oi.
eiIi;Oi
Now lets assume that the output, y, is a linear combination
ofx9:
l hTx where hT h1; h2; . . . ; hnT 3:8
M. Naja et al. / Applied Energy 96 (2012) 347358 351The inputs
are assumed to be known and deterministic,5 and theoutput is what
is measured from the system behavior. For example,in the case of
the mixing box, the inputs are the outside air temper-ature (OAT),
the return air temperature (RAT), and the outside airdamper
position (DMP), and the output could be the mixed air tem-perature
(MAT) or outside air fraction (OAF).6
Under these assumptions, P(ei|f1 . . . fn) can be written
as7:
Peijf1 . . . fn POijIi; f1 . . . fn 3:6Eq. (3.6) is indeed a
probabilistic model of system performance.
It denes the system output as a random variable
conditionallydependent on the input and the fault status.
Interpreting the modeloutput as a random variable provides a
systematic structure to dealwith uncertainties in the model output
due to modeling simplica-tions and errors. In this framework, such
uncertainties can bequantied into the random variable variance.
One challenge with Eqs. (3.4) and (3.5) is that, for
applicationswith a large number of potential faults, there would be
a very largenumber of faulty scenarios to analyze (it can be on the
order of athousand or more). For applications such as an air
handling unitin which the number of faults is limited and
manageable, this isnot a concern. However, for more complex
applications wherethe number of potential faults/abnormalities is
on the order ofhundreds, it would be computationally problematic.
One solutioncould be solving Eqs. (3.4) and (3.5) numerically by
employingnumerical algorithms such as the Markov chain Monte
Carlo(MCMC) method. Another practical approach is to adopt more
sim-plications/assumptions to reduce the problems complexity.
Forinstance, we may assume that concurrent faulty scenarios
withmore than three simultaneous faults are negligible, as they
havea very small probability.8
The probabilistic models in Eq. (3.6) can be developed in
differ-ent ways. They could be an extension of analytical models
withadded uncertainties/errors, or more sophisticated statistical
proce-dures can be employed to develop the models. For example,
thecharacteristics of the output random variable can be thought
ofas a combination of a set of basis functions generated at the
input,linearly combined with coefcients inuenced by the system
faultstatus. If the output random variable is a Gaussian
distribution (or,more generically, an exponential family
distribution), the estima-tion of the linear coefcients can be
straightforward. As some ofthe demonstrations in Section 4 employ
these types of models, itwould be helpful to briey address the
derivations of such models.
Lets assume that the system has a set of inputs I = [I1, I2,I3,
. . . , Im]T and an output, y, which we assume to be a
Gaussiandistribution with l, r2 as the mean and variance variables.
Also, as-sume that there is a set of basis functions {h1, h2, h3, .
. . , hn} project-ing the input vector I to x = [x1, x2, x3, . . .
, xn]T so that we have:
x1 h1I; x2 h2I; . . . ; xn hnI 3:7
5 The assumption of deterministic inputs can be dropped for more
generalscenarios.
6 OAF is dened in Section 4.7 Here we assume modeling the static
behavior of the system.8 Keep in mind that such
simplication/assumption would affect only the
denominator of Equation (3.4) [or the last of part of Eq.
(3.6)], which is the
normalizing factor for correct estimation of the posterior
probabilities. They will notaffect the process of locating the
fault combination with maximum posteriordistribution. They would
change only slightly the marginal probability of faults.As y is a
Gaussian distribution, we have:
Pyijl;r2 12pr2
p exp 12r2
yi li2
12pr2
p exp y2i
2r2
exp
liyi l2i2
r2
8