Application of machine learning in the fault diagnostics of air handling units.pdf

ul

tle720ley,a 94rnia

Keywords:Bayesian networkHVAC systems

gy ue, vequepme

tics. The focus of this paper is on developing diagnostic algorithms for air handling units that can address

signic

to have occurred even though it may result in an increase in energyusage. As long as the control system satises the set-points, thebuilding operators tend to assume that the system is working ef-ciently in a non-faulty condition.

The topic of fault detection and diagnosis in air handling unitshas been an active area of research and development for more than

poses. Other factors and considerations such as control objectives,nancial constraints, and practical limitations are also involved. Asa result, we are confronted with situations in which the perfor-mance of two or more components is monitored through onlyone sensor (or one set of sensors). A well-known example is reli-ance on supply air temperature to analyze the functionality of

Corresponding author.E-mail addresses: [email protected] (M. Naja), [email protected] (D.M.

Auslander), [email protected] (P.L. Bartlett), [email protected] (P. Haves),

1 If the model is a detailed rst-principle model, the a priori knowledge comprisesmainly model parameter values and their variations. If the model is an empiricalmodel, the a priori knowledge is usually high-quality training data for systembehavior in different modes.

Applied Energy 96 (2012) 347358

Contents lists available at SciVerse ScienceDirect

Applied Energy

[email protected] (M.D. Sohn).energy consumption and have a major impact on comfort condi-tions and building maintenance cost. An air handling units energyusage can vary from the original design as components fail or fault:dampers leak or fail to open/close, valves get stuck, and so on. Suchproblems do not necessarily result in occupant complaints, as thecascade structure of the control system would try to neutralizethe fault effect through re-adjusting other parameters and/orchanging the component loads. For instance, the effect of a dam-per-leakage fault may be covered by re-adjusting the position ofthe hot or cold water valves. The fault may not even be recognized

dling unit diagnostics.The fact is that the principles of HVAC systems, particularly for

air handling units, are known well enough to create suitable modelstructures; however, the accuracy of such models can be improvedonly up to a certain level; beyond that, excessive effort is requiredto obtain high-quality a priori knowledge,1 which negatively affectsmodel scalability. This limits the applicability of diagnostic strategiesthat rely on accurate or detailed models.

On the other hand, the architecture of sensor networks in airhandling units is not necessarily designed solely for diagnostic pur-Air-handling unitEnergy managementFault detection and diagnosisMachine learning

1. Introduction

1.1. Overview

Air handling units account for a0306-2619/$ - see front matter 2012 Elsevier Ltd. Adoi:10.1016/j.apenergy.2012.02.049such constraints more effectively by systematically employing machine-learning techniques. The pro-posed algorithms are based on analyzing the observed behavior of the system and comparing it with aset of behavioral patterns generated based on various faulty conditions. We show how such a pattern-matching problem can be formulated as an estimation of the posterior distribution of a Bayesian proba-bilistic model. We demonstrate the effectiveness of the approach by detecting faults in commercial build-ing air handling units.

2012 Elsevier Ltd. All rights reserved.

ant portion of building

two decades. However, in spite of the progress and effort made,there is still a lack of reliable, affordable, and scalable solutionsto locate and manage faults in these systems; modeling limitations,measurement constraints, and the complexity of concurrent faultsare among the main challenges for scalable solutions for air han-Available online 27 March 2012able, scalable, and affordable diagnostic solutions for such systems. Modeling limitations, measurementconstraints, and the complexity of concurrent faults are the main challenges in air handling unit diagnos-Application of machine learning in the fa

Massieh Naja a,, David M. Auslander a, Peter L. BaraDepartment of Mechanical Engineering, University of California, Berkeley, California 94bComputer Science Division and Department of Statistics, University of California, BerkecCommercial Building Systems Group, Lawrence Berkeley National Laboratory, CalifornidAirow and Pollutant Transport Group, Lawrence Berkeley National Laboratory, Califo

a r t i c l e i n f o

Article history:Received 16 August 2011Received in revised form 9 January 2012Accepted 20 February 2012

a b s t r a c t

An air handling units enerers leak or fail to open/clospant complaints and, consin the research and develo

journal homepage: www.ell rights reserved.t diagnostics of air handling units

tt b, Philip Haves c, Michael D. Sohn d

, United StatesCalifornia 94720, United States720, United States94720, United States

sage can vary from the original design as components fail or fault damp-alves get stuck, and so on. Such problems do not necessarily result in occu-ntly, are not even recognized to have occurred. In spite of recent progressnt of diagnostic solutions for air handling units, there is still a lack of reli-

vier .com/locate /apenergy

Energy 96 (2012) 347358the mixing box and heating and cooling coils. As will be shown la-ter, in such scenarios, when the sensor output is contaminated, itcould be due to the malfunction of any involved components,and it is not necessarily straightforward to locate the malfunction-ing one.

The complexity of modeling limitations and measurement con-straints in air handling unit diagnostics becomes even more severewhen the possibility of concurrent faults is taken into account. Asingle-fault assumption would relieve the diagnostic complexity,but in reality, two or more faults may occur at the same time with-in one component or across different ones. The effect of concurrentfaults is not necessarily a linear interpolation of each individualones.

Nomenclature

HVAC heating, ventilation, and air conditioningNo fault no-fault conditionReverse reverse actuator faultOAD leak outside air damper leakage faultRAD leak return air damper leakage faultStuck stuck damper faultFouling fouling faultVLV stuck valve-stuck faultSAT supply air temperature (F)OAT outside air temperature (F)RAT return air temperature (F)MAT mixed air temperature (F)DMP outside air damper position (F)T_air_in temperature of entering air (F)T_water_in temperature of entering water (F)T_air_out temperature of outgoing air (F)NTU number of transfer unit (NTU) methodCFM cubic feet per minute, measurement of air volume ow

rate

348 M. Naja et al. / AppliedOne approach to relieve the diagnostic complexity due to mod-eling limitations and measurement constraints is active diagnos-tics. In active-mode diagnostics, the diagnostic mechanismactively controls or manipulates the system inputs (e.g. damperpositions, valves, etc.) to detect and isolate faults. Usually, inputsare changed based on predened (or adaptive) test sequences toexplore various operating conditions. The tests can be structuredto explore operating points with less uncertainty or error, or inthe case of one sensor being affected by several components func-tionality, put neighboring components into neutral states to haveone component at a time affecting the measured variable. How-ever, active-mode diagnostics require isolation of the system fromnormal operation, an option that may not be feasible.

Conversely, in passive-mode diagnostics, there is no control onthe inputs. In this approach, the system is in a closed-loop opera-tion manipulated by the control system based on the set-point er-ror and so on. This is a more complicated scenario, as there is nocapability to change or manipulate the inputs to follow a test pro-cedure or sequence. The diagnostic mechanism needs to somehowmake the best use of available data (measurements) from dailyoperation.

The focus of this paper is on developing passive-mode diagnos-tic algorithms for air handling units that can systematically ad-dress the above constraints in a passive mode. We believe thatan ideal diagnostic solution should not only be reliable in detectingand isolating abnormal behaviors but also have systematic solu-tions for constraints and challenges related to scalability andaffordability. Our proposed diagnostic algorithm is based onanalyzing observed behavioral patterns and comparing them witha set of predened patterns generated based on different faultassumptions. In Section 3, we will show how such a pattern-matching problem can be formulated as estimation of the posteriordistribution of a Bayesian diagnostic model. We will also showhow the proposed diagnostic framework can systematically ad-dress modeling and measurement constraints. In Section 4, wedemonstrate the effectiveness of the proposed algorithm using var-ious examples.

2. Literature survey

Heating, ventilation, and air conditioning (HVAC) systems ac-

VLV valve positionIID independent and identically distributedva air velocity (ft/s)vw velocity of water (ft/s)Ch hot uid capacity rateCc cold uid capacity rateTair-in temperature of incoming air (F)Tw-in temperature of incoming water (F)a coefcient factorb coefcient factorl mean or expected valuer2 varianceDP total pressure rise across fanCp specic heat of air (BTU/lbF)d density (lb/ft3)g fan combined efciencycount for more than 30% of annual energy use in the United States[3,5,6]; however, it has become apparent that only a small percent-age of them work efciently or in accordance with the design in-tent [2,9]. Operational faults are one of the main causes for theinefcient operation of HVAC systems. Studies of existing buildingshave found that energy savings of 515% are typically achievablesimply by xing faults and optimizing HVAC control systems [8].

However, the current methods of detecting faults or perfor-mance creep are labor-intensive. Typically, building operators orengineers use intuition and various rules of thumb to identify theproblem. In practice, the labor-intensiveness of these tasks is suchthat they are not routinely performed and in fact may never be per-formed. If the 515% energy savings are to be met in practice,HVAC systems must be capable of detecting when a failure has oc-curred, when performance is creeping and to determine the likelyoffending hardware or operating condition. Automated systems forfault detection are, therefore, essential if low-energy or net-zeroenergy goals are to be met nationally.

Functionally, an air handling unit (AHU) is a device used to con-dition and circulate the air as part of an HVAC system. It is usually alarge metal structure containing one or two fans, a mixing box, andheating/cooling coils2 (Fig. 1). The mixing box mixes the air return-ing from the building with fresh outside air; the minimum ratio ofoutside air to be re-circulated is specied by building codes. Theheating/cooling coils heat up or cool down the mixed air to maintainthe required supply air temperature and humidity.

2 It may contain both or either.

EneM. Naja et al. / AppliedTypically, an air handling unit contains three temperature sen-sors, the outside air temperature (OAT), return air temperature(RAT), and supply air temperature (SAT) sensors, along with a fanstatus indicator (Fig. 1). One of the main challenges in monitoringair handling unit performance is the absence of a reliable measure-ment for the mixed air temperature (MAT), the temperature of theair coming from the mixing box before going through the heating/cooling sections. Usually, either there is no sensor in place to mea-sure the MAT or, even if there is a temperature sensor, the sensorreadings are unreliable due to incomplete upstream mixing. Thisconstraint forces us to use the SAT sensor to evaluate mixing boxperformance. However, as shown in Fig. 1, the SAT is also affectedby the heating/cooling coil functionality, and distinguishing the

Fig. 1. Air hanrgy 96 (2012) 347358 349mixing box effects from the heating/cooling coil effects is notstraightforward (as in the case when two or more componentsare being monitored through one sensor).

An AHU malfunctions when any number of its internal compo-nents faults. Air handling diagnostics have been an active area forresearch and development [26,27,33,43,7,41,24,12,14]. A variety ofdiagnostic solutions ranging from rst-principle-model-baseddiagnostic routines [16,32] to empirical-model-based diagnosticapproaches [36,32,45,46,34,29,30] and qualitative/rule-based diag-nostic solutions [25,4,19,15] have been developed for the evalua-tion of air handling unit performance and its components.

However, as mentioned earlier, the nature of the HVAC industryand the fact that AHUs are usually designed and customized for

dling unit.

scalability perspective. On the other hand, when an analysis ap-proach employs simplied, more generic, models, the challenge

existence of one or more faults in the system (see for example[39,40]. Once the closest hypothetical pattern is identied, theassociated assumptions are concluded to be the system health sta-

Energy 96 (2012) 347358is how to differentiate between the inconsistencies due to modelmisspecication errors and those due to system malfunction. Inother words, when detailed models are replaced with more simpli-ed ones, the interpretation of model prediction differences be-comes more challenging.

A strategic approach to address the complexity of employingsimplied models is to change the focus of an analysis approach in-stead on system behavioral patterns instead on error residuals. Inother words, instead of analyzing the difference between the sys-tem output and the model prediction at one or a few operatingpoints, diagnostics are made by evaluating the system behavioralpatterns over a window of operation. This lessens the dependencyof the diagnostic algorithm on model accuracy. Such an approachhas been employed by a number of diagnostic routines, particu-larly qualitative and semi-quantitative diagnostic approaches[26,27]. The key here is an algorithm (inference mechanism) thatevaluates the observed behavior and compares it against a set ofpredened (or even adaptive) hypotheses. Fuzzy logic has becomea popular choice for such problems due to the inherent exibilityembedded in fuzzy sets and fuzzy rules, which makes it a suitablesolution for reasoning in domains with some level of uncertainty[44,16,17,20]. For example, Haves et al. [17] proposed a fuzzy-based diagnostic routine for the fault diagnostics of VAV air han-dling units in which the fuzzy-based inference mechanism com-pares the predictions of simplied models with the air handlingunit component outputs at various operating conditions to drawconclusions about the air handling unit health status.

However, fuzzy-based inference mechanisms have their ownlimitations. As the problem complexity grows (due to the systemcomplexity, a large amount of disparate sensor data, the numberof potential faults, etc.), a large number of fuzzy sets and fuzzyrules are required to analyze the system performance. Added tothis is the difculty with adjusting and tuning fuzzy sets eithermanually or through other approaches.

Another approach to managing modeling limitations are rule-based diagnostic routines [42,10,35,1,28,37,38]. In this approach,a priori knowledge is formulated through a set of if-then rules cou-pled with an inference mechanism searching through the rules todraw a diagnostic conclusion. Rule-based frameworks can be de-signed based on expert knowledge or rst principles. Their advan-tage is simplicity and ease of deployment; however, as discussed inKatipamula and Brambley [26,27], as problem complexity grows orwhen new/additional rules are added, the simplicity of the ap-proach is lost quickly. Furthermore, sometimes the activation ofthe rules depends on threshold(s), which may depend greatly onmodel uncertainties, measurement errors, or other issues. Morediscussion on this can be found in House et al. [19].

In this paper, we adopt the strategy of employing simpliedmodels, as we believe that dependency on complex and detailedmodels is a signicant technological barrier and cause for industryresistance to large-scale deployment. Our approach therefore relieson more sophisticated inference mechanisms to interpret discrep-ancies between model predictions and the system output.

3. Diagnostic algorithm

We think of fault diagnostics as the process of analyzing a sys-each individual buildings limit the applicability of diagnostic solu-tions that rely on detailed models (or models that rely on congu-ration data that is not easily measureable or accessible) from the

350 M. Naja et al. / Appliedtem behavioral pattern (observed performance) and comparing itwith a set of hypothetical patterns to nd the closest match. Eachhypothetical pattern is developed based on the assumption of thetus. For example, in mixing box diagnostics, if it turns out that theobserved performance is closer to the behavioral pattern describedby the outside-air-damper-leakage fault condition from a pool ofbehavioral patterns associated with stuck-damper fault, reverse-actuator fault, and so on, it is concluded that the underlying mixingbox had an outside-air-damper-leakage fault.3

To formulate this within a mathematical framework, let us de-ne the set of potential faults as:

F ff1; f2; f3; . . . :; fng 3:1and the measured data from the system is dened as:

E fe1; e2; e3; . . . :; emg 3:2where e1 . . .em present vectors of the data measured at t = 1, . . .,m.The aim is to calculate the probability of F given E, P(F/E) posteriorprobability of F, and nd out for which combination of f1, f2, f3, . . ., fn,P(F/E) is maximized.

f1 . . . fn Represents the set of all possible faults in the systems (fiis 1 when the ith fault exists and 0 when the ith fault does notexist). For example, in the mixing box example, f1 could be an out-side-air-damper-leakage fault, f2 could be a return-air-damper-leakage fault, and f3 could be a reverse-actuator fault. Therefore,F = {1, 0, 0} means that only one fault (an outside-air-damper-leak-age fault) exists; F = {0,0,1} is related to the case of reverse-actua-tor fault, similarly, and F = {1, 1, 0} is related to the case of twoconcurrent faults: an outside-air-damper-leakage fault and a re-turn-air-damper-leakage fault. The case of F = {0, 0, 0} is relatedto a no-fault scenario.

Note that the marginal probability of an individual fault (fj) canbe calculated by:

Pfjje1; e2; e3; . . . ; em P

f1 ...fnexcludingfj

Pf1; f2; f3; . . . :; fnje1; e2; e3; . . . ; em

3:3Now, using Bayes rule, we can compute P(F/E) as:

Pf1 . . . fnje1 . . . em Pf1 . . . fnPe1 . . . emjf1 . . . fnPf1 ...fn

Pf1 . . . fnPe1 . . . emjf1 . . . fn 3:4

where P(f1 . . . fn) is the prior distribution. Different strategies or logiccan be used to estimate the prior distributions. They can be denedbased on statistical analysis: if there are statistical results or quali-tative information about which faults (or fault combinations) aremore frequent than others. Additionally, intuitive methods can beemployed to dene the fault priors. In this paper, we follow the phi-losophy that a single fault is more likely to occur than two faultssimultaneously; similarly, two concurrent faults have a higheroccurrence probability than three concurrent faults. Therefore, sin-gle faults are assigned a higher prior than two concurrent faults, andtwo concurrent faults would have a higher prior than three concur-rent faults, and so on.

With an IID sampling assumption,4 Eq. (3.4) can be expanded as:

log Pf1 . . . fnje1 . . . em log Pf1 . . . fn Pmi1

log Peijf1 . . . fn

log Pf1 ...fn

Pf1 . . . fnPe1 . . . emjf1 . . . fn

3:5

3 The mixing box functionality, model, and diagnostic algorithm are discussed indetail in Section 4.4 Here, the IID assumption means that, given faults f1 . . . fn, the random variablese1 . . . em are statistically independent and identically distributed. More on IIDsampling can be found in DasGupta [11].

P(ei|f1 . . . fn) is the likelihood function: the probability of measuringei given f1 . . . fn. This comes from the system model: assuming thatthe fault condition f1 . . . fn exists, what is the likelihood of measuringei? We can split ei into two sets: the sets of system inputs, Ii and sys-tem outputs, Oi.

eiIi;Oi

Now lets assume that the output, y, is a linear combination ofx9:

l hTx where hT h1; h2; . . . ; hnT 3:8

M. Naja et al. / Applied Energy 96 (2012) 347358 351The inputs are assumed to be known and deterministic,5 and theoutput is what is measured from the system behavior. For example,in the case of the mixing box, the inputs are the outside air temper-ature (OAT), the return air temperature (RAT), and the outside airdamper position (DMP), and the output could be the mixed air tem-perature (MAT) or outside air fraction (OAF).6

Under these assumptions, P(ei|f1 . . . fn) can be written as7:

Peijf1 . . . fn POijIi; f1 . . . fn 3:6Eq. (3.6) is indeed a probabilistic model of system performance.

It denes the system output as a random variable conditionallydependent on the input and the fault status. Interpreting the modeloutput as a random variable provides a systematic structure to dealwith uncertainties in the model output due to modeling simplica-tions and errors. In this framework, such uncertainties can bequantied into the random variable variance.

One challenge with Eqs. (3.4) and (3.5) is that, for applicationswith a large number of potential faults, there would be a very largenumber of faulty scenarios to analyze (it can be on the order of athousand or more). For applications such as an air handling unitin which the number of faults is limited and manageable, this isnot a concern. However, for more complex applications wherethe number of potential faults/abnormalities is on the order ofhundreds, it would be computationally problematic. One solutioncould be solving Eqs. (3.4) and (3.5) numerically by employingnumerical algorithms such as the Markov chain Monte Carlo(MCMC) method. Another practical approach is to adopt more sim-plications/assumptions to reduce the problems complexity. Forinstance, we may assume that concurrent faulty scenarios withmore than three simultaneous faults are negligible, as they havea very small probability.8

The probabilistic models in Eq. (3.6) can be developed in differ-ent ways. They could be an extension of analytical models withadded uncertainties/errors, or more sophisticated statistical proce-dures can be employed to develop the models. For example, thecharacteristics of the output random variable can be thought ofas a combination of a set of basis functions generated at the input,linearly combined with coefcients inuenced by the system faultstatus. If the output random variable is a Gaussian distribution (or,more generically, an exponential family distribution), the estima-tion of the linear coefcients can be straightforward. As some ofthe demonstrations in Section 4 employ these types of models, itwould be helpful to briey address the derivations of such models.

Lets assume that the system has a set of inputs I = [I1, I2,I3, . . . , Im]T and an output, y, which we assume to be a Gaussiandistribution with l, r2 as the mean and variance variables. Also, as-sume that there is a set of basis functions {h1, h2, h3, . . . , hn} project-ing the input vector I to x = [x1, x2, x3, . . . , xn]T so that we have:

x1 h1I; x2 h2I; . . . ; xn hnI 3:7

5 The assumption of deterministic inputs can be dropped for more generalscenarios.

6 OAF is dened in Section 4.7 Here we assume modeling the static behavior of the system.8 Keep in mind that such simplication/assumption would affect only the

denominator of Equation (3.4) [or the last of part of Eq. (3.6)], which is the

normalizing factor for correct estimation of the posterior probabilities. They will notaffect the process of locating the fault combination with maximum posteriordistribution. They would change only slightly the marginal probability of faults.As y is a Gaussian distribution, we have:

Pyijl;r2 12pr2

p exp 12r2

yi li2

12pr2

p exp y2i

2r2

exp

liyi l2i2

r2

8

Application of machine learning in the fault diagnostics of air handling units.pdf

Documents

air handling units energyusage

forair handling units

diagnostic purair

leakage fault

airhandling units

topic of fault detection

control system satises

neutralizethe fault