HVAC_CSIRO_Proof_2015

A

UcH

MQ1

Sa

b

c

a

ARRAA

KDQ4CFEPFH(NwtH

Q3

1Q5

iiac[ia

Q2

((

h1

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

ARTICLE IN PRESSG ModelSOC 2983 1–24

Applied Soft Computing xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Applied Soft Computing

j ourna l h o mepage: www.elsev ier .com/ locate /asoc

nsupervised feature selection using swarm intelligence andonsensus clustering for automatic fault detection and diagnosis ineating Ventilation and Air Conditioning systems

itchell Yuwonoa,∗, Ying Guob, Josh Wallc, Jiaming Lib, Sam Westc, Glenn Platt c,teven W. Sua

Faculty of Engineering and Information Technology, University of Technology, Sydney (UTS), 15 Broadway, Ultimo, NSW 2007, AustraliaThe Commonwealth Scientific and Industrial Research Organisation (CSIRO), Division of Computational Informatics, Marsfield, NSW 2122, AustraliaThe Commonwealth Scientific and Industrial Research Organisation (CSIRO), Division of Energy Technology, Mayfield West, NSW 2304, Australia

r t i c l e i n f o

rticle history:eceived 4 May 2014eceived in revised form 12 February 2015ccepted 17 May 2015vailable online xxx

eywords:ata clusteringonsensus clusteringeature selectionnsemble Rapid Centroid Estimation (ERCE)article Swarm Optimizationault detection and diagnosiseating Ventilation and Air Conditioning

a b s t r a c t

Various sensory and control signals in a Heating Ventilation and Air Conditioning (HVAC) system areclosely interrelated which give rise to severe redundancies between original signals. These redundanciesmay cripple the generalization capability of an automatic fault detection and diagnosis (AFDD) algo-rithm. This paper proposes an unsupervised feature selection approach and its application to AFDD ina HVAC system. Using Ensemble Rapid Centroid Estimation (ERCE), the important features are auto-matically selected from original measurements based on the relative entropy between the low- andhigh-frequency features. The materials used is the experimental HVAC fault data from the ASHRAE-1312-RP datasets containing a total of 49 days of various types of faults and corresponding severity.The features selected using ERCE (Median normalized mutual information (NMI) = 0.019) achieved theleast redundancies compared to those selected using manual selection (Median NMI = 0.0199) CompleteLinkage (Median NMI = 0.1305), Evidence Accumulation K-means (Median NMI = 0.04) and Weighted Evi-dence Accumulation K-means (Median NMI = 0.048). The effectiveness of the feature selection method is

HVAC) systemonlinear Auto-Regressive Neural Networkith eXogenous inputs and distributed

ime delays (NARX-TDNN)idden Markov Model

further investigated using two well-established time-sequence classification algorithms: (a) NonlinearAuto-Regressive Neural Network with eXogenous inputs and distributed time delays (NARX-TDNN); and(b) Hidden Markov Models (HMM); where weighted average sensitivity and specificity of: (a) higherthan 99% and 96% for NARX-TDNN; and (b) higher than 98% and 86% for HMM is observed. The proposedfeature selection algorithm could potentially be applied to other model-based systems to improve thefault detection performance.

41

42

43

44

45

46

. Introduction

Heating Ventilation and Air Conditioning (HVAC) systems aremportant for maintaining the thermal comfort and indoor air qual-ty at places such as offices, shopping malls, warehouses, schools,nd homes [1,2]. According to the report by CSIRO [3], 25% of energy

Please cite this article in press as: M. Yuwono, et al., Unsupervised

tering for automatic fault detection and diagnosis in Heating Ventilahttp://dx.doi.org/10.1016/j.asoc.2015.05.030

onsumption in Australia is accounted from commercial buildings3]. Moreover, HVAC systems represents 40–50% of energy usen these buildings [4]. In the United States (US), HVAC systemsccount for almost 31% of the electricity consumed by households

∗ Corresponding author. Tel.: +61 430731938.E-mail addresses: [email protected] (M. Yuwono), [email protected]

Y. Guo), [email protected] (J. Wall), [email protected] (J. Li), [email protected]. West), [email protected] (G. Platt), [email protected] (S.W. Su).

ttp://dx.doi.org/10.1016/j.asoc.2015.05.030568-4946/© 2015 Published by Elsevier B.V.

47

48

49

50

51

© 2015 Published by Elsevier B.V.

[1]. Operational problems in the HVAC systems can cause excessenergy consumption. Regular checks and maintenance are there-fore crucial to prevent unnecessary consumption. However, due tothe high reactionary maintenance costs, preventive or predictivemaintenance practices are usually preferred to reactionary main-tenance.

Discriminating a normally behaving HVAC system to a faultcondition is a relatively well researched area. A variety of auto-matic fault detection and diagnosis (AFDD) techniques provide anumber of benefits to the HVAC systems [5–7]. The current AFDDtechniques available in the market for HVAC systems are mainly

feature selection using swarm intelligence and consensus clus-tion and Air Conditioning systems, Appl. Soft Comput. J. (2015),

rule-based approaches [8–10], which obtain prior knowledge toderive a set of if-then-else rules and an inference mechanism thatsearches through the rule-space to draw conclusions. The rule-based systems can be based solely on expert knowledge (inferredfrom experience) or can be based on prior knowledge of a specific

52

53

54

55

56

INA

2 oft Com

sdp

iterimtmmTtsttwaortfiatA

rwfpm(fottgs

bdifcTstiasicgid

ceomimae

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

ARTICLEG ModelSOC 2983 1–24

M. Yuwono et al. / Applied S

ystem. Being one of the very first methods used in HVAC faultetection problems, the rule-based approaches have been mostopularly used over the last decades.

Indeed the rule-based approaches come with advantagesncluding ease of development, transparent reasoning, abilityo reason even under uncertainty, and the ability to providexplanations for the conclusions reached. However, one mustealize that most HVAC systems are installed in different build-ngs/environments. This generally means that rules or analytical

odels developed for a particular system cannot be easily appliedo an alternative system. As such, the difficult process of deter-

ining and setting rules or generating analytical mathematicalodels must be tailored to each individual building/environment.

he threshold method utilized in the rule-based system is proneo producing false alarms. Moreover, building conditions such astructure of the internal architecture design and even external fac-ors (such as shading and the growth of plant life) often change afterhe system installation/initialization of a fault detection system,hich can require rules/models that were originally appropri-

te to be revisited and updated. It can be learned that a numberf weaknesses associated with this type of approach include theequirement of specific tailoring to a system, potential failure ofhe AFDD system due to its limited knowledge boundaries, and dif-culty in updating the model when the AFDD system is installed in

different HVAC system. The aforementioned complications withhe rule-based approach give rise to the data driven methods forFDD in HVAC systems.

Regardless of the approach, the performance of an AFDD algo-ithm generally depends on the quality of the features. In CSIRO,e are developing a novel data-driven machine learning technique

or AFDD in HVAC systems [4,11–14]. Preliminary results wereresented in [11–14], showing the superior performance of theachine learning-based technique in detecting air-handling unit

AHU) faults to rule-based methods based on fault data obtainedrom ASHRAE Project 1312-RP up to 90% accuracy [13]. However,ne limitation of the AFDD systems described in [11–13] is thathey rely on features provided by field experts. As with rules, fea-ures that are particularly effective for a particular system may notuarantee equivalent performance when utilized in an alternativeystem.

Selecting the appropriate features is essential in any model-ased frameworks. Feature selection aims for minimizing redun-ancies/mutual information between features such that the more

mportant ‘characteristic’ features are not undermined. Specificaults exhibit specific symptoms which are observable only inertain clusters of features that behave differently to the others.he difficulty is that these cluster of features need to be con-tantly monitored as they may change dynamically depending onhe condition of the HVAC system under investigation. Moreover,ncorrect selections of these characteristic features are dangerouss they may adversely effect the final classifier to an extent thatome obvious faults are overlooked. The motivation of this papers therefore to design a reliable method for feature selection thatan be used to augment the effectiveness of AFDD frameworks ineneral. The unsupervised data-driven feature selection algorithms designed for HVAC systems operating under varying seasonalynamics.

Evolutionary algorithms are particularly powerful for solvingomplex optimization problems with multiple local minima. Forxample, Differential Evolution (DE) has been used for optimizationf pressure vessel structure design [15] and joint replenish-



ent and distribution model [16]. Although the methods outlinedn [15,16] are powerful for general purpose optimization, a

ajor algorithmic restructuring is required to implement theselgorithms for cluster optimization. Instead, our paper is inter-sted in exploiting a lightweight evolutionary algorithm designed

PRESSputing xxx (2015) xxx–xxx

specifically for clustering purposes, the Rapid Centroid Estimation(RCE) [17].

Unsupervised feature selection based on data clustering is inher-ently an ill-posed problem where the goal is to group redundantfeatures into some unknown number of clusters based on intrin-sic information alone. For this paper, we utilize the Ensemble RapidCentroid Estimation (ERCE) [17,18], a semi-stochastic multi-swarmclustering algorithm inspired by the Particle Swarm Optimization(PSO [19]), to determine the characteristic features for the specificseason. The method is designed to automate the selection of charac-teristic features in each season. The block diagram of the proposedmethod is shown in Fig. 1.

The performance of the proposed feature selection algorithmwas tested using two well established time-sequence classifiers:(a) Nonlinear Auto-Regressive Time Delay Neural Networks withExogenous inputs (NARX TDNN); and (b) Hidden Markov Models(HMM) [13]. A comprehensive comparison would also be givenwith regards to other feature selection methods including Li’sManual selection [20], Complete Linkage (CL), Ensemble EvidenceAccumulation K-means (EAC K-means) and Weighted EvidenceAccumulation K-means (WEAC K-means).

The paper is structured as follows: Section 2 presents theoverview of the proposed method as well as the materials used toexamine its performance. Section 3 presents the detailed descrip-tion for each component including feature extraction, featureselection, and the classifier used in experiment. Section 4 describesthe theoretical foundations of the consensus clustering algorithmthat we utilize for performing the feature selection. Section 5describes the data utilized in the experiments. Section 6 presentsa comprehensive experimental result of the proposed method andcomparative analysis with other conventional feature selection andclassification algorithms. Section 7 presents in depth analyses anddiscussion regarding the results. Finally, Section 8 presents the con-clusion and future direction of the research.

2. General overview on HVAC systems

HVAC systems are configured and used to control the environ-ment of a building or a zone including one or several rooms. Theenvironmental variables may, for example, include temperature,air-flow, and humidity. The desired values/set-points of the envi-ronmental variables will depend on the intended use of the HVACsystem. If the HVAC system is being used in an office building, theenvironmental variables will be set to make the building/roomstherein comfortable to humans. An HVAC system typically servicesa number of zones within a building. The system normally includesa central plant which includes:

• a hydronic heater and chiller,• a pump system, which may include dedicated heated and chilled

water pumps, circulates heated and chilled water from the heaterand chiller through a circuit of interconnected pipes, and• a valve system, which may include dedicated heated and chilled

water valves, controls the flow of water into a heat exchangesystem (which may include dedicated heated and chilled watercoils).

The heated and/or chilled water circulates through the heatexchange system before being returned to the central plant wherethe process repeats (i.e. the water is heated or chilled and recircu-


lated). In the heat exchange system, energy from the heated/chilledwater is exchanged with air being circulated through an air distri-bution system.

The HVAC system also includes a sensing system which typicallyincludes a number of sensors located throughout the system, such

179

180

181

182

183

ARTICLE IN PRESSG ModelASOC 2983 1–24

M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 3

of th

agioiivtHi

sfvdda

rsrtgap

abec

f2Rr

3

docT

•

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

Fig. 1. Block diagram

s temperature, humidity, air velocity, volumetric flow, pressure,as, position, and occupancy detection sensors. The HVAC systems controlled by a control system that may be a stand alone system,r may form part of a building automation system (BAS) or build-ng management and control system (BMCS). The control systemncludes a computing system which is in communication with thearious components of the HVAC system. The control system con-rols and/or receives feedback from the various components of theVAC system in order to regulate environmental conditions for the

nhabitancy or functional purpose of the building.In an AFDD process, data from the components of the HVAC

ystem is received. This data may, for example, include sensed datarom various sensors within the system and feedback data fromarious components of the system. Additional data from externalata sources can also be received, such as the external weatherata. Consequently, the dimensionality and volume of these datare enormous.

In order to ensure proper identification of faults, an AFDD algo-ithm requires redundancies in the selected sensory and controlignal sources to be minimized. Additional information given byedundant features are irrelevant and provide no useful informa-ion in describing the type of fault and will ultimately cripple theeneralization capability of the fault detector. Insufficient featuresre equally as dangerous as it may lead misdiagnoses due to incom-lete information.

The method presented in this paper offers an unsupervisedpproach for feature selection method using ERCE. The system cane summarized in the block diagram in Fig. 1. A sample featurextraction and feature selection result using our proposed approachan be seen in Fig. 2.

The experimental materials in this paper are the experimentalault data from the ASHRAE-1312-RP datasets including Summer007, Spring 2008, and Winter 2008 from the ASHRAE Project 1312-P. In each season, different faults were generated, recorded andeported for experimental uses.

. Methods

Selecting important features in a HVAC system is challengingue to the excessive interrelations between signals. This sectionverviews our contribution on feature selection using consensuslustering and how it is applied for the HVAC system in particular.



he section is subdivided into five subsections:

Section 3.1 outlines the general model that we use for extractingmagnitude and oscillation (spectral centroid) features from a rawsignal.

e proposed method.

• Section 3.2 outlines our proposed polar approach for visualizingmulti-dimensional patterns.• Section 3.3 defines the measure that we use for quantifying the

degree of dissimilarity between features.• Section 3.4 provides the general overview of our main contri-

bution, a method for feature selection using semi-stochasticswarm-based consensus clustering, which will be furtherdetailed in Section 4.• Section 3.5 shows the architecture of the neural networks that we

use to benchmark the efficiency of the proposed feature selectionmethod.

3.1. Extracting time signal features: magnitude and spectralcentroid

Sensory signals from a HVAC system are streamed in the formof sampled time signals. From each time signal, HVAC engineersmainly observe two main features for deciding the condition of thesystem:

1. Whether the average magnitude of a sensory reading is insidethe typical condition for the specific season.

2. Whether there is any excessive oscillation in the sensory read-ings compared to the typical condition for the specific season.

For example, a fault type classified as Sequence of Heating andCooling Unstable (HCSF0517) can be identified by observing theexcessive oscillation of the Chilled Water Coil control signal (CHWCGPM). The phenomenon can be seen in Fig. 3. In this Figure, it is easyto observe that the moving average magnitude of the CHWC GPMduring HCSF0517 is considerably close to the typical behavior.

We model these two features mathematically as the movingaverage magnitude and spectral centroid. For a discrete signal gs(n),the two features can be measured using a straightforward calcula-tion as follows.

Magnitude characteristic is measured using a simple movingaverage which is calculated as follows,

MAG(gs) = 1N

N∑n=1

gs(n), (1)

where n denotes the sample number, N denotes the length of the


window.Spectral centroid of a signal describes the center of mass of the

spectrum, which can be calculated as follows,

gs = FFT(gs, NFFT ), (2)

263

264

265

266


4 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx

F aturesw electe

S

wotf

tcnird

z

waatf

e{

y

w

((

(

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

ig. 2. (a) Raw signals for the Spring 2008 dataset; (b) the low and high frequency fehile signals 161–320 are spectral centroid signals; (c) characteristic features are s

C(gs) =∑NFFT

n=5 |gs(n)|gs(n)∑NFFTn=5 |gs(n)|

, (3)

here FFT denotes fast Fourier transform, NFFT indicates the numberf bin, gs(n) and |gs(n)| represent the center frequency and magni-ude of the nth bin. Notice that the frequency centroid is calculatedrom the fifth bin to isolate only the high frequency oscillation.

Fault can be interpreted as ‘how much a signal deviates from itsypical characteristic during the specific season’. Incorporating thisriterion, each feature vector qs which includes {MAG(gs), SC(gs)} isormalized with respect to its normal operation. The discrepancy

n both direction and magnitude relative to the normal signal isepresented as a signed multiple of the signal’s standard deviationuring typical operation,

s(n) = qs(n) − �n(n)�n(n)

, (4)

here �n(n) and �n(n) denote the mean and standard deviation of feature during its normal operation at a specific sample n takent a particular time of the day. One can automatically realize thathe approach simply calculates the cross-sectional z-score of theeature qs.

The hyperbolic tangent kernel is then applied on the z-score,ffectively transforming each feature to a continuous measure from

− 1, 1} as follows

s(n) = tanh (zs) (5)

hich has a rather intuitive ‘fuzzy’ interpretation as follows:



a) ys(n) = 0: feature is at a typical level.b) ys(n) → −1: feature is atypical negative (much smaller than its

typical level),c) ys(n) → 1: feature is atypical positive (much larger than its typ-

ical level).

are isolated from each signal. Signals 1–160 are moving average magnitude signalsd using ERCE, while (d) classification is done using NARX-TDNN.

Intuitively, the variability of ys throughout the season would pro-vide a good indicator of its importance. In this paper, we measurevariability of a feature in term of its entropy as follows,

Hys = −∫

pys (x) log pys (x)dx, (6)

where pys (x) can be approximated empirically from the histogramof ys.

3.2. Feature visualization

Visualization is an important tool to verify the effectiveness of afeature selection algorithm. However, due to the complexity of anHVAC system, simultaneous visualization would easily overwhelmthe observer.

In this paper a polar approach for visualizing patterns consti-tuted by multi-dimensional feature cross-sections is proposed. Thevisualization scheme can be seen in Fig. 4.

Using the proposed visualization scheme, we have the variablenumbers listed in particular angles in the circle, whose correspond-ing radius represents the magnitude of ys, as previously detailedin Eq. (5). A normal system would oscillate inside the typicalregion (ys = 0) such that the polar plot shows a circle-like pat-tern. During fault condition the sensors behave inside either thepositive/negative atypical region such that the polar plot assumesvarious shapes other than circle. For example, Fig. 5 shows that thepattern during normal operations are visually different to the OADamper Stuck (OADS) fault scenario.

3.3. Measuring divergence between features


A pair of feature vectors y1 ∈ Y and y2 ∈ Y calculated from Eq.(5) can be treated as a vector of random numbers generated by theprobability distribution functions P = p(x) and Q = q(x), respectively.y1 and y2 can be assumed to be redundant (i.e. generated fromthe same distribution) when the Kullback–Leibler(KL) divergence

320

321

322

323

324



F atert istic, t

bt

tl

K

=

wdsi

K

powerful for identifying strong clusters in the data [22]. This is par-ticularly useful for our application as can be seen in Section 6 whereit can be observed that the features selected using consensus clus-tering algorithms are generally more compact and least redundant

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

ig. 3. The magnitude (top) and frequency (bottom) characteristics of the Chilled Whough CHWC GPM during HCSF0517 is correlated in terms of magnitude character

etween the two approaches zero [21]. A practical illustration ofhe case can be seen in Fig. 6.

KL-divergence measures the relative entropy between two dis-ributions [21]. KL-divergence measures the amount of informationost when Q is used to approximate P as follows,

L(P||Q ) =

H(P,Q )︷︸︸︷−∑

x

p(x) log q(x)+

−H(P)︷︸︸︷∑x

p(x) log p(x), (7)

∑x

p(x) logp(x)q(x)

, (8)

here H(P, Q) denotes the cross entropy between P and Q and H(P)enotes the information entropy of P. In this paper we use theymmetrical KL-divergence as originally proposed in [21] due tots symmetrical property as follows,

Ls(P||Q ) = KL(P||Q ) + KL(Q ||P) =∑

p(x) logp(x) − q(x) log

p(x). (9)



x

q(x) q(x)

Control signal (CHWC GPM) during fault (HCSF0517) vs. normal (NOR0505). Evenhe signal is uncorrelated in terms of frequency characteristic.

3.4. Feature selection using consensus clustering

Performing feature selection using prototype-based algorithmssuch as K-means, fuzzy C-means, or Self Organizing Map (SOM),can be difficult because the number of characteristic features K isnot initially known. Consensus clustering provides a quantitativeevidence for determining the number and membership of possibleclusters within a dataset (in our case, features). The method hasgained popularity in cancer genomics as a powerful tool to extractand visualize the dependencies between genes [22–24].

In this paper we propose an approach for unsupervised fea-ture selection using a swarm based ensemble algorithm [18]. Anadvantage of ensemble clustering algorithms to the conventionalclustering algorithms is that they allow a robust estimation ofnatural clusters by investigating the consensus strength betweenmultiple clusterings [22,25,26]. Consensus clustering is particularly


compared to the ones selected using complete-linkage. 358



we ca

12

safce

Fd

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

Fig. 4. The proposed polar visualization scheme. In this illustration,

The feature selection process can be summarized as follows:

. Determine the feature clusters using consensus clustering.

. For each cluster, rank each feature according to its entropy andpick one whose entropy is the highest as the characteristic fea-ture for the cluster.

A sample result of a run of feature selection process using con-



ensus clustering is shown in Fig. 7. Features in the same clusterre denoted accordingly using the same color. The radius of eacheature indicates the entropy. A bold circle in each cluster is thehosen characteristic features, which is the feature with the highestntropy compared to the others in the same cluster.

ig. 5. The proposed polar visualization scheme showing the characteristic signals in nataset.

n see that features other than features #4 and #5 behave atypically.

3.5. Fault classification using Nonlinear Auto-Regressive NeuralNetwork with eXogenous inputs and distributed time delays(NARX-TDNN)

The Non-linear Auto-Regressive with eXogeneous inputs(NARX) network architecture [27] is a class of discrete-time non-linear systems. The NARX architecture can be broadly expressed inthe parallel mode,

y(t) = f (u(t − nu), . . ., u(t − 1), u(t), y(t − ny), . . ., y(t − 1)), (10)

or in the series-parallel mode,


y(t) = f (u(t − nu), . . ., u(t − 1), u(t), y(t − ny), . . ., y(t − 1)), (11)

where u(t), y(t) and y(t) denote input, actual output and esti-mated output of the network at time t. nu and ny are the inputand output order, and f denotes a nonlinear function, which can be

ormal operation scenarios (left) and in OADS scenario (right) in the Winter 2008

379

380

381

382



F ny clud UMD . Ift

acfdtw[

tr(TtHoc

4

bs

•

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

ig. 6. A simplified case of redundancy between features in a HVAC system. How maistributions is intuitively smaller than the divergence between yCHWC−VLV and ySA−H

hem into two clusters, i.e. {{ yCHWC−VLV , yCHWC−GPM }, {ySA−HUMD , yRA−HUMD}}.

pproximated using a Multilayer Perceptron (MLP). As opposed toonventional Recurrent Neural Network (RNN), a NARX network’seedback comes only from the output neurons rather than its hid-en states. Using this simplified configuration, it has been arguedhat NARX networks generalize better compared to other RNN net-orks, especially on problems involving long-term dependencies

28].The configurations described in Eqs. (10) and (11) differ only in

heir mode of feedback. The configuration described in Eq. (10) iseferred to as parallel mode or recurrent NARX (NARX-P), while Eq.11) is referred to as series-parallel mode NARX (NARX-SP) [29].he NARX-P uses the state estimate feedback, while NARX-SP useshe actual observable state. Due to the fact that the actual state of anVAC system is practically unavailable at all times, the deploymentf NARX in an AFDD systems is currently limited to the NARX-Ponfiguration.

. Consensus clustering



This section explains, in great detail, the semi-stochastic swarm-ased consensus clustering approach to feature selection in a HVACystem. The section is subdivided into six subsections:

Section 4.1 briefly introduces the consensus clustering paradigm,

sters are there? It can be seen that the divergence between yCHWC−VLV and yCHWC−GPM

these four signals were to be clustered, then a possible solution would be to assign

• Section 4.2 presents the visual abstract of our proposed featureselection method,• Section 4.3 overviews Fred and Jain’s Ensemble Accumulation

[25],• Section 4.4 summarizes our previous work on Swarm Rapid Cen-

troid Estimation (SRCE) [17],• Section 4.5 introduces the newly proposed ‘self-evolution’ strat-

egy for the SRCE,• Section 4.6 outlines the new implementation of ERCE for feature

selection purposes.

4.1. Fundamentals of consensus clustering

Consensus clustering infers a consensus matrix from multipleruns of clustering algorithms. This consensus matrix encodes theprobability of each pairs of observation belonging to the same clus-ter. It has been argued that the natural, and arguably, optimum


clusters can be validated with higher confidence by analyzing thestability of this matrix [22,25].

The consensus matrix C is a positive semidefinite N × N squarematrix of joint probabilities. Each Cij ∈ {0, 1} represents the proba-bility of data point i and j belonging in the same cluster. For given

419

420

421

422

423



Fig. 7. A result of feature selection using ERCE (Algorithm 4, Section 4) on the Spring 2008 dataset, projected on the first and second principal components for ease ofQ6visualization. Each point represents a feature where the radius denotes the corresponding entropy. Each feature cluster is color coded and the characteristic feature of eachc res froc lected( with

d , the

ac

C

wu

424

425

426

427

428

429

luster is annotated accordingly. In this example, ERCE chose 16 characteristic featuan be seen that the spectral centroid feature for CHWC-GPM (SC CHWC-GPM) is seRF) and Supply Fan (SF) features are particularly important. This discovery is in lineuring the season. (For interpretation of the references to color in this figure legend

cluster assignment obtained from the mth clustering, we can cal-



ulate the mth co-association matrix as follows,

m = UTmUm, (12)

here each Um is a Km× N matrix which stores the values ofik,m for i ∈ {1, . . ., N} and k ∈ {1, . . ., Km} obtained from the mth

Fig. 8. An illustration describing the architecture of the Parallel Nonlinear Auto-Re

m the 320 features (160 magnitude features and 160 spectral centroid features). It, in line with the observation in Fig. 3. ERCE accurately discovered that Return Fan

the existence of Return Fan Failure (RFF) faults (May 12th, 18th, and 19th) observedreader is referred to the web version of the article.)

run of any clustering algorithm. Each uik,m denotes the probabil-


ity of a data point yi belonging to the cluster Ck. For any m, Um

should satisfy the constraints uik,m ∈ {0, 1} and∑K

k=1uik,m = 1. Thematrix multiplication represents a probabilistic ‘and’ operator con-veniently calculated using the (multiplicative) fuzzy T-norm [30].The ith diagonal component of Cm, i.e. Dii,m, quantifies the degree of

gressive Time Delay Neural Networks with eXogenous input (NARX-TDNN).

430

431

432

433

434



F f the

c and 0e tion 4

sp

C

Twf

C

wc[f

D

wmlttdd

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

ig. 9. Various partitions on the Spring 2008 dataset encoded by 16 subswarms oonstant � is set to 1.2, target entropies � are uniformly randomized between 0.005ase of visualization. In depth explanation regarding the method can be read in Sec

tability for the ith data in the mth clustering. In this paper weropose normalizing Cm by its diagonal matrix Dm as follows,

m = D−1/2m CmD−1/2

m (13)

he consensus C, or ensemble aggregate, is calculated as theeighted average of the co-association matrices C1, C2, . . ., CM as

ollows,

=∑M

m=1wmCm∑Mm=1wm

, (14)

here wm denotes the weight of the corresponding partition whichan be determined manually or using any cluster validation method31]. wm can also be set to assume equal weighting such that wm = 1or all m [25].

The consensus distance matrix can be defined as follows [22],

= 1 − C (15)

hich transforms the consensus matrix into a pairwise distanceatrix. Fred and Jain [25] proposes using single/average/complete



inkage algorithm on the D matrix to recover the natural cluster. Inheir 2005 paper, a criterion called maximum lifetime is proposedo determine the optimum threshold for cutting the cluster den-rogram [25]. Readers are encouraged to refer to [25] for moreetails.

Self Evolving Swarm Rapid Centroid Estimation (SE-SRCE, Algorithm 3). Fuzzifier.05. The coordinates are projected to the first and second principal components for.4 and Section 4.5.

4.2. Visual abstract: feature selection using ERCE

A visual abstract of the proposed swarm-based consensusclustering algorithm can be seen in Figs. 9 and 10. Fig. 10presents the consensus matrix and hierarchical cluster tree (clus-ter dendrogram) from the aggregation of the partitions shown inFig. 9.

4.3. Evidence accumulation

Fred and Jain propose the Evidence Accumulation (EAC) in2005 as a consensus clustering framework for combining theresult of multiple runs of a crisp prototype-based clusteringalgorithm (e.g. K-means) [25]. Wang proposes a generalizationto the algorithm, extending the applicability of the EAC forboth crisp and fuzzy clusters [30]. He finds that fuzzy par-titions is rather advantageous to crisp partitions in EnsembleAccumulation as the degree of overlapping in fuzzy partitionencodes to an extent how ‘close’ together clusters are [30].The approach can be summarized as a two step process asfollows,


1. Split: Partition the data matrix Y into some number of parti-tions Km (may be fixed or randomized within an interval) usingany prototype-based clustering algorithm. Repeat this step Mtimes.

473

474

475

476

ARTICLE ING ModelASOC 2983 1–24

10 M. Yuwono et al. / Applied Soft Com

Fig. 10. A heat map presenting the consensus matrix resulted from the aggregationof an SE-SRCE swarm shown in Fig. 9 using Algorithm 4 (Section 4.6). The rows andcolumns indicate individual items (in our case: the 320 features) whose consensusvalues range from 0 (never clustered together) to 1 (always clustered together)marked by white to dark blue. The complete linkage cluster dendrogram showingthe degree of redundancy between features is shown above the consensus matrix.Between the cluster dendrogram and the consensus matrix is the cluster label vectorsuggested by the maximum lifetime cut. The output of the consensus clustering isas shown in Fig. 7. (For interpretation of the references to color in this figure legend,t

2

vtMd

u

u

Wpm

[sp[

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

he reader is referred to the web version of the article.)

. Merge: Calculate the consensus matrix C and interpret theensemble clustering by performing a desired graph algo-rithm.

Given the data vectors yi ∈ Y, for each clustering m, Km centroidectors xk ∈ Xm can be obtained using any prototype-based clus-ering algorithm (e.g. K-means, fuzzy C-means, Gaussian Mixture

odels). The degree of membership of yi w.r.t xk is a function ofistance calculated as follows,

ik,m ={

1 if argminxk∈X

d(yi, xk,m)

0 otherwiseu ∈ [0, 1] (16)

ik,m =d(yi, xk,m)−1/(�−1)∑K

d(y , xj,m)−1/(�−1), � > 1 u ∈ {0, 1}. (17)



j=1 i

ang argues that using fuzzy partition in consensus clustering isarticularly efficient for suppressing over-segmentation. It is alsoore tolerant to noisy information than its crisp counterpart [30].The conventional approach using Evidence Accumulation (EAC)

25] and Weighted Evidence Accumulation (WEAC) [31] areummarized in Algorithm 1. Notice that the pseudocode is sim-lified using the fuzzy t-norm approach to EAC as introduced in30].

526

527


Algorithm 1. (Weighted) Ensemble Clustering ((W)EACClustering)

Input dim × N Data Matrix Y, maximum number of prototypes Kmax , number ofrepetitions M, Prototype-based clustering algorithm Cluster (e.g. K-means,Fuzzy C-means), Linkage algorithm Linkage.

Output Crisp Ensemble Partition L1: for m = {1, . . ., M} do2: // Partition Y using random number of clusters.3: Krnd← random({2, Kmax})4: {Um , Xm} ← Cluster(Y, Krnd)5: // Calculate the co-association matrix for each clustering.6: Cm ← UT

mUm

7: Cm ← D−1/2m CmD−1/2

m

8: end for9: // Calculate the consensus matrix

10: C ←∑M

m=1wmCm∑M

m=1wm

,

11: // Interpret the consensus matrix using Linkage algorithm12: HierarchicalTree = linkage(C)13: th← MaximumLifetime(HierarchicalTree)14: L ← Cut(HierarchicalTree, th)15: Note that the threshold for cutting the hierarchical tree is determined

using maximum lifetime method [25].

4.4. Swarm Rapid Centroid Estimation

Yuwono [17] proposes the Swarm Rapid Centroid Estimation(Swarm RCEr+) algorithm in 2011 [32]. The semi-stochastic clus-tering algorithm efficiently incorporates the paradigms of ParticleSwarm Optimization (PSO [19]) into the traditional ExpectationMaximization (EM). The statistical validation on benchmark datasuggest that Swarm RCEr+ have a reduced risk of converging tolocal minima and leaner computational complexity compared toearlier evolutionary-algorithm-based clustering approaches [17].The algorithm was updated in 2014 to further decrease its memorycomplexity to be used for Ensemble clustering applications [18].The RCE algorithm below follows the 2014 preposition.

A particle in an RCE subswarm stores a tuple consisting of aposition vector x and a velocity vector v,

particlek,m = {xk,m, vk,m}. (18)

The position vector of each particle represents the coordinate ofa centroid vector xi ∈ R

dim. In RCE a subswarm is a collection ofcentroid coordinates, encoding a possible solution to the clusteringproblem. As the RCE swarm consists of M of such subswarm, atthe end of optimization, as many as M clustering solutions can beobtained.

Each subswarm stores two memory matrices:

1. The self-organizing memory Ym, which is an array of randomlysampled pointers to the data Y,

Ym = randsample(Y, �%), (19)

where � % ∈ {0, 1} denotes the rate of random sampling.2. The best position memory X

bestm which stores the position vec-

tors X = {x1, . . ., xKm } that minimizes a given objective functionf (Ym, Xm) throughout the search. A typical objective function isusually defined as, but not restricted to, the average distortion,

f (Ym, Xm) =∑

xk∈Xm

∑yi∈Ym

uik,md(xk, yi)∑yi∈Ym

uik,m(20)


where uik,m can be calculated either using Eq. (16) or Eq. (17).The RCE swarm X

best matrix is the union of all Xbestm such that,

Xbest =

⋃M

m=1X

bestm (21)

528

529

530



F datast

u

v

x

ws

wdq

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

ig. 11. Trajectory of the Swarm RCE particles recorded after 30 iterations on a toyo initialization. M = 6, tmax = 30, ε = 0.05, ıreset = 15.

On each iteration, the velocity and position of a particle ispdated as follows,

k,m(t + 1) = vk,m(t) + �k,m(t) (22)

k,m(t + 1) = xk,m(t) + vk,m(t + 1) (23)

here � denotes the resultant vector, which consist mainly of theelf organizing term and minimum (best position) term,

�k,m(t) = ϕ1 ◦

self organizing︷︸︸︷( ∑|Ym|i=1 uik,m (yi − xk,m(t))∑|Ym|

i=1 uik,m

)

+ ϕ2 ◦

minimum (best position)︷︸︸︷⎛⎝∑|Xbest |

j=1 qjk,m (xbestj

(t) − xk,m(t))∑|Xbest |j=1 qjk,m

⎞⎠,

= ϕ1 ◦ (E[Ym|Xm = xk,m] − xi,m)

(24)



+ϕ2 ◦ (E[Xbest |Xm = xk,m] − xk,m),

here ϕ ∈ {0, 1} ∈ Rdim denotes a uniform random vector; uik,m

enotes the cluster membership when Ym is mapped to Xm; whilejk,m denotes the cluster membership when X

best is mapped to Xm.

et with numerous random seeding shows Swarm RCE robustness and insensitivity

Should the self-organizing vector of a particle equals 0, xi willbe directed to xI win,m, the position of the winning particle. xIwin,m

is a particle in the mth subswarm whose cluster has the largestcardinality.

The RCE is equipped with two strategies to cope with suboptimalconvergence including substitution and particle reset as follows:

1. Substitution strategy forces particles in a search space to reachalternate equilibrium positions by introducing position instabil-ity. After each position update episode for a particle, apply

{xi(t + 1), vi(t + 1)} ={{xI win(t + 1) + N(0, �), 0} if ϕ < ε

{xi(t + 1), vi(t + 1)} otherwise(25)

where ϕ is a uniform random number ϕ ∈ {0, 1}, and N(0, �) isa Gaussian random vector with mean � = 0 and standard devia-tion � of each dimension of the data being clustered. ε denotesthe substitution probability parameter. Larger ε increases the fre-


quency. Optimal ε values lie between 0.01 ≤ ε ≤ 0.05 [17]. RCEwith substitution strategy enabled is denoted with the super-script +.

2. Particle reset strategy is triggered when fitness of the localminimum f (Ym, Xbest

m (t)) does not improve after a number of

555

556

557

558

559

INA

1 oft Com

ts

A

4

cas

c

h

Bm

H

wpo

w

ther information on the co-association tree can be read in Wang’spaper [34].

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627


2 M. Yuwono et al. / Applied S

iterations. Stagnation can be detected using a stagnation counterı which is updated as follows:

ı(t + 1) ={

ı(t) + 1 if f (Ym, X(t)) ≥ f (Ym, Xbest(t))

0 otherwise. (26)

When ı(t + 1) > ımax this strategy reinitializes all particles in asubswarm without resetting the local minimum position matrix.Values being reinitialized are only xk(t) and vk(t). Swarm conver-gence is detected when f (Ym, X

best(t)) does not improve aftera number of resets. RCE with particle reset strategy enabled isdenoted with the superscript r.

The algorithm pseudocode is shown in Algorithm 2. An illus-ration of the search trajectory of the swarm on a toy example ishown in Fig. 11.

lgorithm 2. Swarm RCEr+

Input Data points Y = {y1, . . ., yN } ∈ Rdim , # of clusters K.

Output Swarm centroid vectorsX

best = {Xbest1 , Xbest

2 , . . ., XbestM } ∈ R

dim .1: Initialize the swarm (randomize(X1,. . .,M), V1,. . .,M = 0).2: For each subswarm m, randomly sample Y and store it in the

memory Ym = randsample(Y, �%).3: repeat4: for all m ∈ {1, . . ., M} do5: Calculate Um from the pairwise distance between Xm

and Ym ,6: Calculate Qm from the pairwise distance between Xm

and Xbest ,

7: Store Xbestm which minimizes f (Ym, Xm) throughout the

search,8: Vm← Vm + � m ,9: Xm← Xm + Vm ,10: Redirect particles with zero cardinality toward the

particle whose cluster has the largest cardinality.11: Apply substitution with rate of ε

12: if f (Ym, Xbestm ) does not improve after ıreset iterations

then13: Reinitialize subswarm (randomize(Xm), Vm = 0)14: end if15: end for16: until Convergence or maximum iteration reached17: return X


2 , . . ., XbestM } ∈ R

dim .

.5. Self Evolving Swarm RCE

In this implementation we introduce a new self-evolutionriterion to the RCE which allows each subswarm to summondditional particles at will until the target cluster entropy isatisfied.

The uncertainty for a fuzzy membership value uik ∈ {0, 1} [33]an be quantified as follows,

ik,m = uik,m log uik,m. (27)

ezdek argues that a good clustering can be achieved when hik,m isinimized [33]. The average cluster entropy is then,

m = − 1Km|Ym|

Km∑k=1

|Ym|∑i=1

uik,m log uik,m, (28)



here Um is calculated from Xbestm . Hm close to 0.5 indicates a

ossible underpartitioning. Hm very close to 0 may also indicateverpartitioning.

Hm is only investigated each when there is an update to Xbestm

here the number of non-empty clusters is equal to Km such that


|Cbestm | = Km. If Hm is larger than the target entropy �m, the number

of particles incremented using the following rule,

Km(t) ={

Km(t) + z+r if Hm > �m,

Km(t) otherwise,(29)

where Km(t) denotes the number of particles in the swarm m at thecurrent iteration t, z+r denotes an upper-bounded random integer,z+r ∈ Z

+ = [1, 2, . . ., z+max], while �m ∈ {0, 0.5} denotes a target Hm.Using this approach each subswarm to automatically adjusts Km

until the entropy criterion is satisfied.The desired granularity and diversity of the swarm can be con-

trolled by setting or randomizing the value of �m. The growth speedof the swarm can be controlled by setting z+r . As the subswarmsinfer Km automatically from Hm, the need of specifying the ran-domization interval is now abolished (recall that in EAC and WEACK-means, Km is randomized within a pre-specified upper and lowerbound).

The pseudocode of the Self-Evolving Swarm RCEr+ (SE-SRCE) canbe seen in Algorithm 3. A typical summary of an execution of SE-SRCE can be seen in Fig. 12.

Algorithm 3. Self-Evolving Swarm RCEr+ (SE-SRCE)Input Data points Y = {y1, . . ., yN } ∈ R

dim , # of clusters K.Output Swarm centroid vectorsX


2 , . . ., XbestM } ∈ R

dim .1: Initialize the swarm (randomize(X1,. . .,M), V1,. . .,M = 0).2: For each subswarm m, randomly sample Y and store it in the

memory Ym = randsample(Y, �%).3: repeat4: for all m ∈ {1, . . ., M} do5: Execute Algorithm 2 lines 5–14,6: if f (Ym, Xm) improves then7: // Check whether the entropy criterion is satisfied and

whether all subswarms are nonempty8: if |Cbest

m | = Km and Hm > �m then9: Km ← Km + z+r10: end if11: end if12: end for13: until Convergence or maximum iteration reached14: return X


2 , . . ., XbestM } ∈ R

dim .

4.6. Ensemble Rapid Centroid Estimation using Self-EvolvingSwarm

Ensemble RCE (ERCE) [18] is an ensemble extension to theSwarm RCEr+. The algorithm is shown to be relatively leaner com-plexity compared to conventional ensemble clustering algorithms[18], achieving up to quasilinear complexity in both time and space[18].

In this application we propose incorporating the proposedSE-SRCE into the ERCE framework. As the size of the evidence accu-mulation matrix is still relatively manageable (recall that sincethere are 320 features = 160 magnitude features + 160 spectral cen-troid features, the size of C is 320 × 320), EAC can be performedwithout using the co-association tree compression process pro-posed in the original paper [18,34]. However, it needs to be notedthat should the number of features increase up to thousands, it isadvisable that the co-association tree compression is utilized. Fur-


In order to interpret the final clustering, we need to clarify that inour application each cluster represents “a group of more redundantfeatures”. For each feature cluster, a feature with the largest entropyis selected as a characteristic feature for the cluster. The pseudocodeof ERCE used in our application is shown in Algorithm 4.

628

629

630

631

INA

ft Com

A

5

eHtmBi2wnw1awamfidu

6

Ompw

12

Samt

••

632

633

634

635

636

637

638

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729


M. Yuwono et al. / Applied So

lgorithm 4. Ensemble Rapid Centroid Estimation (ERCE)Input dim × N Data Matrix Y, number of subswarms M, fuzzification

constant �, target entropy for each subswarm {�1, . . ., �M}, Linkagealgorithm Linkage.

Output Crisp Ensemble Partition LX

best ← SE − SRCE(Y)for all m ∈ {1, . . ., M} do

Given Y and Xbestm , calculate Um using Eq. (17).

// Calculate the co-association matrix for each clustering.Cm ← UT

mUm

Cm ← D−1/2m CmD−1/2

m

end for

C ←∑M

m=1wmCm∑M

m=1wm

,

HierarchicalTree = linkage(C)th← MaximumLifetime(HierarchicalTree)L ← Cut(HierarchicalTree, th)// interpreting the final partitionfor all Ck ∈ {C1, . . ., YL max} do

// For each feature cluster, the characteristic feature is the feature withhighest entropy

ycharacteristick

= argmaxy∈Ck−∫

py(x) log py(x)dx

end for

. Experimental data

The ASHRAE Project 1312-RP modeled and reported a wide vari-ty of faults in three different seasons. The experiments include twoVAC systems running side by side with identical zone load. Fault

est was conducted in Air Handling Unit (AHU)-A, meanwhile nor-al operation was running in AHU-B. By comparing AHU A and

fault characteristics were recorded. ASHRAE-1312-RP datasetsncluded detailed experimental result from Summer 2007, Spring008, and Winter 2008. In each season different types of faultsere generated, recorded and reported. Readings from 160 sig-als sources during normal operation and various fault scenariosere recorded. The data was sampled every minute from 6:00 to

8:00. The faults reported in the ASHRAE-1312-RP datasets as wells a summary on the behavior of the feature proposed by Li [20],ere described in Table 1. Note that the features used in this table

re not part of our research but rather to illustrate how a staticodel would struggle during varying seasons. This is because the

eatures that are important in one season may not be as importantn other seasons. The feature that we use throughout the paper isetermined dynamically using consensus clustering based on thenique behavior in each season.

. Result

Based on the features in Table 1, we can see that faults such asASB, MADU and HCSF are particularly difficult to identify using Li’sodel [20]. In this section we present the experimental result of our

roposed unsupervised feature selection method. In this section weish to investigate the following:

. What the characteristic features for each season are, and

. Whether the selected features improves the generalization capa-bility of an AFDD algorithm in general. In particular, we areinterested in whether we can reliably identify OASB, MADU, andHCSF using the features selected by our proposed method.

Our approach is as follows. From each dataset (Summer 2007,pring 2008, and Winter 2008), as many as 160 time signals, and

vector recording the time of the day were reported. Using the



ethod described in Section 3.1 as many as 320 + 1 additional fea-ure could be extracted including:

Magnitude features from 160 sensor and control signals,Spectral centroid features from 160 sensor and control signals.

PRESSputing xxx (2015) xxx–xxx 13

• Time of the day (1 feature),

For clarity, the step-by-step process of the experiment can besummarized as follows:

1. Select a season and get the raw signals during normal operations.2. For each raw signal, isolate the magnitude and spectral centroid

components and calculate the fuzzy feature representation usingthe method described in Section 3.

3. Find the characteristic features using a consensus clusteringalgorithm (Our approach uses ERCE: Algorithm 4).

4 . Append the time-of-the-day feature as an additional feature.5. Using the selected features, train a model (Our approach uses

NARX-TDNN) using the data in Table 1. For each type of fault,randomly partition the data as follows:• 15% as training set,• 15% as validation set, and• 70% as test set.

6. Investigate the results on the test set to see whether using theselected features increases/decreases the classifier’s generaliza-tion capability.

6.1. Feature selection result

We wish to keep the number characteristic feature to a reason-able level (e.g. between 4 and 30) to ensure that the generalizationcapability of the classifier is not undermined. The parameters ofboth ERCE, EAC K-means, and WEAC K-means were selected basedon the assumption derived using the method illustrated in Fig. 12.From the average entropy-distortion scatter for each season suchas depicted in Fig. 12, we approximated the number of character-istic features to be around 5–30 or the average cluster entropy of0.005–0.05.

The parameters used for ERCE were as follows. The initial num-ber of particles was set to 2, the number of subswarms was set to60, substitution probability ε was set to 3%, ıreset was set to 15, thedistance metric was set to KL-divergence, fuzzifier � was set to 1.2,the entropy threshold for each subswarm �m was uniformly ran-domized between 0.005 and 0.05, z+max = 2, maximum number ofiterations was set to 100, and the linkage method was set to com-plete linkage. KL-divergence and complete linkage were selectedas the physical model of the HVAC was assumed to be unknownand even a subtle difference in temporal patterns/shapes could bean important predictive component for specific types of fault. Com-plete linkage favors the formation of small spherical clusters whichis particularly useful for capturing these subtle differences. Opti-mum cut was then conventionally calculated using the maximumlifetime criterion [25]. Subswarms were equally weighted duringensemble aggregation such that w1,...,M = 1.

Further investigation was also performed in order to benchmarkthe quality of the feature selected by the method. Benchmark unsu-pervised feature selection methods includes EAC K-means [25],WEAC K-means [31], and a traditional complete linkage agglomer-ative clustering (CL). CL was utilized to verify the advantages of theconsensus approaches to a conventional graph-based approach. Inthis experiment, the CL hierarchical tree is cut using inconsistencycriterion, with inconsistency coefficient = 1, returning as many as84 clusters, thus 84 characteristic features.

The parameters for EAC K-means and WEAC K-means were setas follows. The number of repetitions was set to 60, the numberof clusters k was uniformly randomized between 5 and 30. The


distance metric was set to KL-divergence. The linkage method wasset to complete linkage as per discussion. The optimum cut wascalculated using the maximum lifetime criterion [25]. Weights forWEAC K-means were calculated using the average silhouette widthcriterion [35].

730

731

732

733

734

Please cite

this

article in

press

as: M

. Y

uw

ono,

et al.,

Un

sup

ervised featu

re selection

usin

g sw

arm in

telligence

and

consen

sus

clus-

tering

for au

tomatic

fault

detection

and

diagn

osis in

Heatin

g V

entilation

and

Air

Con

dition

ing

systems,

Ap

pl.

Soft C

omp

ut.

J. (2015),

http

://dx.d

oi.org/10.1016/j.asoc.2015.05.030

AR

TIC

LE

IN P

RE

SS

G M

odelA

SOC

2983 1–24

14

M.

Yuwono

et al.

/ A

pplied Soft

Computing

xxx (2015)

xxx–xxx

Table 1ASHRAE-1312-RP dataset description and symptoms using features described in Shun Li’s model [20].

# Name Description HWC-VLV

P-E-hcoil

CHWC-VLV

P-E-ccoil

SF-SPD P-E-SF RF-SPD P-E-RF P-SA-CFM

P-RA-CFM

P-OA-CFM

SA-TEMP

MA-TEMP

RA-TEMP

HWC-DAT

CHWC-DAT

Summer2007

1 NOR0819 Normal Operation2 NOR0825 Normal Operation3 EADS0820 EA Damper Stuck (Fully

Open)0 0 0 0 + + + + 0 + + 0 0 0 0 0

4 EADS0821 EA Damper Stuck (FullyClose)

0 0 0 0 − − − − 0 − − 0 0 0 0 0

5 RFF0822 Return Fan at fixedspeed (30% speed)

0 0 0 0 ++ ++ −− −− 0 −− ++ 0 0 0 0 0

6 RFF0823 Return Fan completefailure

0 0 0 0 ++ ++ −− −− 0 −− ++ 0 0 0 0 0

7 CHWC0824 Cooling Coil ValveControl unstable

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

(Reduce PIDProportional Band byhalf)

8 CHWC0903 Cooling Coil ValveReverse Action

++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0

9 OADS0826 OADS OA Damper Stuck(Fully Closed)

0 0 0 0 ++ ++ ++ ++ 0 + − 0 0 0 0 0

10 CHWV0827 Cooling Coil Valve Stuck(Fully Closed)

0 0 −− −− ++ ++ ++ ++ ++ ++ ++ ++ + + + ++

11 CHWV0831 Cooling Coil Valve Stuck(Fully Open)

++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0

12 CHWV0901 Cooling Coil Valve Stuck(Partially Open – 15%)

0 0 −− −− ++ ++ ++ ++ ++ ++ ++ ++ + + + ++

13 CHWV0902 Cooling Coil Valve Stuck(Partially Open – 65%)

++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0

14 HCL0828 Heating Coil ValveLeaking (Stage 1 –0.4GPM)

0 ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0


0 ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0


0 ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0

17 OADL0905 OA Damper Leaking(45% Open)

0 0 0 0 − − − − 0 0 ++ 0 0 0 0 0

18 OADL0906 OA Damper Leaking(55% Open)

0 0 0 0 − − − − 0 0 ++ 0 0 0 0 0

19 AHUL0907 AHU Duct Leaking (afterSF)

0 0 + + + + + + + + + 0 0 0 0 0

20 AHUL0908 AHU Duct Leaking(before SF)

0 0 0 0 −− −− −− −− 0 −− −− 0 0 0 0 0

Please cite

this

article in

press

as: M

. Y

uw

ono,

et al.,

Un

sup

ervised featu

re selection

usin

g sw

arm in

telligence

and

consen

sus

clus-

tering

for au

tomatic

fault

detection

and

diagn

osis in

Heatin

g V

entilation

and

Air

Con

dition

ing

systems,

Ap

pl.

Soft C

omp

ut.

J. (2015),

http

://dx.d

oi.org/10.1016/j.asoc.2015.05.030

AR

TIC

LE

IN P

RE

SS

G M

odelA

SOC

2983 1–24

M.

Yuwono

et al.

/ A

pplied Soft

Computing

xxx (2015)

xxx–xxx

15

Table 1 (Continued)


P-E-hcoil

CHWC-VLV

P-E-ccoil


P-RA-CFM

P-OA-CFM

SA-TEMP

MA-TEMP

RA-TEMP

HWC-DAT

CHWC-DAT

Spring2008

1 NOR0502 Normal Operation2 NOR0503 Normal Operation3 NOR0504 Normal Operation4 NOR0505 Normal Operation5 NOR0509 Normal Operation6 OASB0529 OA temperature sensor

bias (+3F)0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

7 OASB0530 OA temperature sensorbias (−3F)

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

8 OADS0507 OA Damper Stuck (FullyClose)

0 0 0 0 + + + + − + −− 0 0 0 0 0

9 OADS0508 OA Damper Stuck (40%open)

0 0 0 0 + + + + − + −− 0 0 0 0 0

10 EADS0527 EA Damper Stuck (Fullyopen)

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


0 0 0 0 0 0 0 − 0 − 0 0 0 0 0 0

12 EADS0511 EA Damper Stuck (40%open)

0 0 0 0 0 0 0 − 0 − 0 0 0 0 0 0

13 CHW0506 Cooling Coil Valve Stuck(Fully Closed)

0 0 −− −− ++ ++ ++ ++ ++ ++ ++ ++ + + + ++

14 CHW0515 Cooling Coil Valve Stuck(Fully Open)

++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0

15 CHW0516 Cooling Coil Valve Stuck(Partially Open – 50%)

++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0

16 RFF0512 Return Fan completefailure

0 0 0 0 0 0 −− −− 0 −− 0 0 0 0 0 0

17 RFF0518 Return Fan at fixedspeed (20%spd)

0 0 0 0 0 0 −− −− 0 −− 0 0 0 0 0 0

18 RFF0519 Return Fan at fixedspeed (80%spd)

0 0 0 0 0 0 ++ ++ 0 ++ 0 0 0 0 0 0

19 AFAB0522 Air filter area block fault(10%)

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

20 AFAB0525 Air filter area block fault(25%)

0 0 0 0 + + + + 0 0 0 0 0 0 0 0

21 MADU0513 Mixed air damperunstable

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

22 MADU0514 Mixed air damperunstable/Cooling coilcontrol unstable

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

23 HCSF0517 Sequence of heating andcooling unstable

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

24 HCSF0601 Supply Fan controlunstable

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Please cite

this

article in

press

as: M

. Y

uw

ono,

et al.,

Un

sup

ervised featu

re selection

usin

g sw

arm in

telligence

and

consen

sus

clus-

tering

for au

tomatic

fault

detection

and

diagn

osis in

Heatin

g V

entilation

and

Air

Con

dition

ing

systems,

Ap

pl.

Soft C

omp

ut.

J. (2015),

http

://dx.d

oi.org/10.1016/j.asoc.2015.05.030

AR

TIC

LE

IN P

RE

SS

G M

odelA

SOC

2983 1–24

16

M.

Yuwono

et al.

/ A

pplied Soft

Computing

xxx (2015)

xxx–xxx

Table 1 (Continued)


P-E-hcoil

CHWC-VLV

P-E-ccoil


P-RA-CFM

P-OA-CFM

SA-TEMP

MA-TEMP

RA-TEMP

HWC-DAT

CHWC-DAT

Winter2008

1 NOR0129 Normal Operation2 NOR0216 Normal Operation3 NOR0217 Normal Operation4 OADS0212 OA Damper Stuck (Fully

Close)−− −− 0 0 ++ + ++ + −− ++ −− 0 − 0 0 0

5 OADL0213 OA damper leaking (52%open)

0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0

6 OADL0215 OA damper leaking (62%open)

0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0

7 EADS0202 EA Damper Stuck (Fullyopen)

0 0 0 0 0 0 0 0 0 + + 0 0 0 0 0


− −− 0 0 0 0 0 −− 0 −− −− 0 0 0 0 0

9 CHW0210 Cooling Coil Valve Stuck(Fully Open)

++ ++ ++ ++ 0 0 0 0 0 0 0 − 0 0 ++ −

10 CHW0211 Cooling Coil Valve Stuck(Partially Open – 20%)

+ + + + 0 0 0 0 0 0 0 0 0 0 ++ 0

11 HCF0205 Heating Coil FoulingStage 1

0 −− 0 0 + + + + 0 + − 0 0 0 0 0

12 HCF0206 Heating Coil FoulingStage 2

0 −− 0 0 + + + + 0 + − 0 0 0 0 0

13 HCRC0207 Heating coil reducedcapacity Stage 1

+ − 0 0 0 0 0 0 0 0 0 0 0 0 0 0


+ − 0 0 0 0 0 0 0 0 0 0 0 0 0 0


+ − 0 0 0 0 0 0 0 0 0 0 0 0 0 0

A plug {0(a) , +(b) , ++(c) , −(d) , −−(e)} indicates that the value for the variable is: (a) 0: unchanged (the fault has no effect on the corresponding variable); (b) +: greater than normal; (c) ++: substantially greater than normal; (d) −:less than normal; (e) −−: substantially less than normal.

Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),http://dx.doi.org/10.1016/j.asoc.2015.05.030



200 400 600 800 10000

0.1

0.2

0.3

0.4

iteration

Ave

. Clu

ster

Ent

ropy

200 400 600 800 10000

10

20

30

40

iteration

Num

ber

of C

lust

ers

200 400 600 800 100010

−2

100

102

iteration

Ave

. Dis

tort

ion

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 5 10 15 20 25Number of Clusters

Clu

ster

Ent

ropy

0 5 10 15 20 250

10

20

30

40

50

60

Number of Clusters

Ave

rage

Dis

tort

ion

0 0.1 0.2 0.3 0.4 0.50

10

20

30

40

50

60

Cluster Entropy

Ave

rage

Dis

tort

ion

0

0.1

0.2

0.3

0.4

0.5

0

5

10

15

20

25

0

10

20

30

40

50

60

Number of ClustersCluster Entropy

Ave

rage

Dis

tort

ion

Fig. 12. The scatter plot of the average distortion with respect to cluster entropy and the number of clusters extracted after a run of SE-SRCE with � = 1.2. The top graphs showthe cross-sectional plots of the three parameters during optimization of SE-SRCE, leading to the creation of the bottom scatter plot. The appropriate entropy range/K rangecan be investigated by observing Km , Hm , and f (Ym, X) trade-offs so that both distortion and entropy can be minimized while keeping the number of clusters to a reasonablelevel.

INA

1 oft Com

mbdmba

wyt

X

a

Y

Tosdp9rbe

tftriq

cEf

6

tuciics

ofTvLtmsm

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844



We measured the appropriateness of the feature selectionethod by investigating the normalized mutual information (NMI)

etween features [26]. Mutual information examines the depen-ence between two discrete distributions X and Y. Minimizingutual information is equal to maximizing the KL-divergence

etween the cross-entropy H(X, Y) and the marginal entropies (H(X)nd H(Y)) as follows,

NMI(X; Y) = I(X; Y)√H(X)H(Y)

,

= H(X) + H(Y) − H(X, Y)√H(X)H(Y)

,

=∑

x∈X

∑y∈Y p(x, y)(log p(x, y)/(p(x)p(y)))√(∑

x∈Xp(x) log p(x))(∑

y∈Y p(y) log p(y)) ,

(30)

here X and Y in our case was a pair of fuzzy feature signals (y1 and2 calculated using Eq. (5)), rounded to the nearest integer, suchhat

(n) = round(y1(n)), X(n) ∈ {−1, 0, 1}, (31)

nd

(n) = round(y2(n)), Y(n) ∈ {−1, 0, 1}. (32)

he NMI is calculated by marginalizing the probability of co-ccurrence between these three discrete categories. For a pair ofignals, NMI closer to 1 indicates that the feature pair is redun-ant. For each feature set, the strictly upper triangular of theairwise NMI matrix is taken and the median, 75 percentile, and5 percentile is averaged over 80 runs. Since we want to minimizeedundancies between features, a good feature set is characterizedy an average NMI closer to 0. Table 2 summarizes the result of thexperiment.

The characteristic features in each season were unique fromhose of other seasons. In order to analyze the important featuresor each season, we repeated the clustering process 200 times. Fromhis process, three histograms describing the probability of occur-ence of the characteristic features for each season were reportedn Fig. 13. The probability of occurrence was calculated as the fre-uency of appearance divided by the number of trials.

The overall patterns for fault classes for each season based on theharacteristic features are presented in Figs. 14–16, respectively.ach circle in these figures show the condition of the characteristiceatures during a specific fault in the HVAC system.

.2. Classification result

Generalization capability of a classifier is a powerful indicator ofhe quality of the features. Using the characteristic features selectedsing the proposed method, a classifier can be trained with lessomputational burden and less probability of overfitting (note thatn our experiment, 30% of the data was equally divided into train-ng and validation sets, the remaining 70% is used as test set). Thelassifier were trained and tested using the fuzzy features, ys, as ishown in Figs. 14–16.

The parameters for NARX-TDNN are set as follows. The numberf hidden neurons was set to 10. The input layer, hidden layer, andeedback orders were set to 2. The architecture is illustrated in Fig. 8.he dataset was divided at random to be used for training (15%),alidation (15%), and test (70%) sets. The training was done using



evenberg–Marquardt algorithm. The experiment was repeated 80imes for each season to test the reliability and repeatability of the

ethod. Using the features shown in Figs. 14–16, the average sen-itivity and specificity of the proposed method compared to Li’sanual feature selection approach is presented in Table 3.


The quality of the feature sets selected by ERCE was bench-marked against the features selected by EAC K-means, WEACK-means, and Complete Linkage. The features selected by thesefour competing algorithms were supplied for both NARX-TDNNand Hidden Markov Models (HMM) [11–13], where the trainingand testing for both classifiers were repeated 100 times for eachpair of feature selection and classification algorithm. The weightedaverage (WA) sensitivity and WA specificity result are reported inTable 4.

The significance of the experimental result were validated usingpaired t-test with null hypotheses as follows:

1. H∗0 : The performance of a classifier using features from ERCE isnot significantly better than using features from algorithm X. Astar (*) in Tables 3 and 4 indicates that H∗0 should be rejected,whereas no sign indicates otherwise.

2. H†0 : Given the same feature selection algorithm, a trained

classifier A does not exercise significantly better performancecompared to classifier B. A dagger (†) in Table 4 indicates that H†

0should be rejected, whereas no sign indicates otherwise.

7. Discussion

As the proposed feature selection process is strictly unsu-pervised, analyzing the result leads to a number of interestingobservations.

With regards to the redundancies between features,it can be seen in Table 2 that all consensus algorithms(Median NMIERCE = 0.019, Median NMIEAC Kmeans = 0.040, MedianNMIWEAC Kmeans = 0.048) in general outperformed CL (MedianNMI = 0.1305), manual selection (Median NMI = 0.0199, Q75%NMI = 0.2227), and no selection (Median NMI = 0.1857). The threeconsensus algorithms reported less than 20 characteristic featureson average, which is at least four times lower than the numberof characteristic features selected using CL. Furthermore, thefeatures selected by ERCE (Median NMI = 0.019 ± 0.004) outper-formed those that are selected by other consensus algorithms:EAC K-means (Median NMI = 0.040 ± 0.011) and WEAC K-means(Median NMI = 0.048 ± 0.034) as indicated by its low NMI. ERCEalso had smaller standard deviations on all performance aspects,especially on the number of features, suggesting the relativelyhigh reliability and repeatability of the proposed swarm-basedconsensus clustering algorithm.

With regards to the reliability of the feature selection algorithm,ERCE consistently selects features that are unique and relevant tothe faults in the corresponding year, as can be seen in Fig. 13. Forexample, throughout the experiment using Winter 2008 dataset,ERCE consistently selected HWC-VLV, PLN-TMP, EA-DMPR, HWC-DAT and HWP-GPM, which are ones of the important features forthe specific season. Pattern for the Winter 2008 dataset is shown inFig. 16. In this figure, the pattern for Exhaust Air Damper Stuck(EADS) faults can be easily distinguished among the others byobserving the conditions of both EA-DMPR and PLN-TMP. Simi-larly, HCRC faults in this season are characterized by abnormalHWC-VLV and VAV-DMPR signals. CHW faults are also observablefrom an increase in HWC-DAT as the system compensates for theincreased flow of chilled water due to the faulty cooling coil valve.ERCE also appropriately discovers that SC CHWC-GPM is a partic-ularly important feature in Spring 2008 due to HCSF0517, as hasbeen discussed previously in Section 3. ERCE discovers that outside


air damper (OA-DMPR) is consistently inside the atypical nega-tive region during HCSF faults. This information may be useful forfurther investigation of the nature of the particular fault.

Regarding the effects of the proposed feature selection algo-rithm to classifier performances, the result of ERCE+NARX-TDNN,

845

846

847

848

849



Table 2The Normalized Mutual Information (NMI) between features selected using various feature selection algorithm on Spring 2008 dataset. Boldface indicates the lowest NMI(the least redundancies between features).

Feature selection method

Without feature selection Manual selection [20] CL

# of Features 320 16 84NMI between characteristic feature pairs

Median 0.1857 0.0199 0.1305Q75% NMI 0.4110 0.3014 0.2227Q95% NMI 0.8821 0.4899 0.4863

Feature selection method

EAC k-Means WEAC K-means ERCE

# of Features 15.90 ± 3.86 16.70 ± 4.73 17.20 ± 1.60

0.00.10.3

ptwbEMf

Fp

850

851

852

853

854

855

856

857

858

NMI between characteristic feature pairsMedian 0.040 ± 0.011

Q75% NMI 0.106 ± 0.025

Q95% NMI 0.404 ± 0.035

articularly in the Spring 2008 shows a clear advantage of ERCEo other feature selection approaches. As can be seen in Table 3,



hen compared to the manual selected features as suggestedy Li [20], supplying NARX-TDNN with the feature selected byRCE results in consistent specificity improvements in Spring 2008.oreover overall statistically significant weighted average per-

ormance improvements are also observed throughout Summer

ig. 13. Representative feature occurrence histogram for each season after 200 clusterirobability of occurrence, calculated as the frequency of appearance divided by the numb

48 ± 0.034 0.019 ± 0.00431 ± 0.068 0.078 ± 0.01364 ± 1.600 0.339 ± 0.031

2007, Spring 2008, and Winter 2008 based on our experiment.Based on the statistical results in Table 4, using features from


Li and EAC K-means limits NARX-TDNN’s specificity at an aver-age around 91.54% and 91.85% respectively. The low average maybe attributed to misclassification of a number of more ambigu-ous faults such as OASB, MADU, AFAB and HCSF. This reportis consistent with Li’s observation, presented in Table 1 where

ng trials. The x-axis denotes the specific label for each feature, y-axis denotes theer of trials.

859

860

861

862

863




1

2

3

456

78

9

10

11

12

13

1415

16 1718

19

20

21 −1.0 0.0 1.0

NOR0819NOR0825

1

2

3

456

78

9

10

11

12

13

1415

16 1718

19

20

21 −1.0 0.0 1.0

EADS0820

1

2

3

456

78

9

10

11

12

13

1415

16 1718

19

20

21 −1.0 0.0 1.0

EADS0821

1

2

3

456

78

9

10

11

12

13

1415

16 1718

19

20

21 −1.0 0.0 1.0

RFF0822

1

2

3

456

78

9

10

11

12

13

1415

16 1718

19

20

21 −1.0 0.0 1.0

RFF0823

1

2

3

456

78

9

10

11

12

13

1415

16 1718

19

20

21 −1.0 0.0 1.0

CHWC0824

1

2

3

456

78

9

10

11

12

13

1415

16 1718

19

20

21 −1.0 0.0 1.0

CHWC0903

1

2

3

456

78

9

10

11

12

13

1415

16 1718

19

20

21 −1.0 0.0 1.0

OADS0826

1

2

3

456

78

9

10

11

12

13

1415

16 1718

19

20

21 −1.0 0.0 1.0

CHWV0827

1

2

3

456

78

9

10

11

12

13

1415

16 1718

19

20

21 −1.0 0.0 1.0

CHWV0831

1

2

3

456

78

9

10

11

12

13

1415

16 1718

19

20

21 −1.0 0.0 1.0

CHWV0901

1

2

3

456

78

9

10

11

12

13

1415

16 1718

19

20

21 −1.0 0.0 1.0

CHWV0902

1

2

3

456

78

9

10

11

12

13

1415

16 1718

19

20

21 −1.0 0.0 1.0

HCL0828

1

2

3

456

78

9

10

11

12

13

1415

16 1718

19

20

21 −1.0 0.0 1.0

HCL0829

1

2

3

456

78

9

10

11

12

13

1415

16 1718

19

20

21 −1.0 0.0 1.0

HCL0830

1

2

3

456

78

9

10

11

12

13

1415

16 1718

19

20

21 −1.0 0.0 1.0

OADL0905

1

2

3

456

78

9

10

11

12

13

1415

16 1718

19

20

21 −1.0 0.0 1.0

OADL0906

1

2

3

456

78

9

10

11

12

13

1415

16 1718

19

20

21 −1.0 0.0 1.0

AHUL0907

1

2

3

456

78

9

10

11

12

13

1415

16 1718

19

20

21 −1.0 0.0 1.0

AHUL0908

Fig. 14. Patterns constituted by the characteristic features for each data in the ASHRAE-1312-RP Summer 2007 dataset.




1

2

3

456

7

8

9

10

11

12

1314 15

16

17

18

19 −1.0 0.0 1.0

NOR0502NOR0503NOR0504NOR0505NOR0509

1

2

3

456

7

8

9

10

11

12

1314 15

16

17

18

19 −1.0 0.0 1.0

OASB0529

1

2

3

456

7

8

9

10

11

12

1314 15

16

17

18

19 −1.0 0.0 1.0

OASB0530

1

2

3

456

7

8

9

10

11

12

1314 15

16

17

18

19 −1.0 0.0 1.0

OADS0507

1

2

3

456

7

8

9

10

11

12

1314 15

16

17

18

19 −1.0 0.0 1.0

OADS0508

1

2

3

456

7

8

9

10

11

12

1314 15

16

17

18

19 −1.0 0.0 1.0

EADS0527

1

2

3

456

7

8

9

10

11

12

1314 15

16

17

18

19 −1.0 0.0 1.0

EADS0510

1

2

3

456

7

8

9

10

11

12

1314 15

16

17

18

19 −1.0 0.0 1.0

EADS0511

1

2

3

456

7

8

9

10

11

12

1314 15

16

17

18

19 −1.0 0.0 1.0

CHW0506

1

2

3

456

7

8

9

10

11

12

1314 15

16

17

18

19 −1.0 0.0 1.0

CHW0515

1

2

3

456

7

8

9

10

11

12

1314 15

16

17

18

19 −1.0 0.0 1.0

CHW0516

1

2

3

456

7

8

9

10

11

12

1314 15

16

17

18

19 −1.0 0.0 1.0

RFF0512

1

2

3

456

7

8

9

10

11

12

1314 15

16

17

18

19 −1.0 0.0 1.0

RFF0518

1

2

3

456

7

8

9

10

11

12

1314 15

16

17

18

19 −1.0 0.0 1.0

RFF0519

1

2

3

456

7

8

9

10

11

12

1314 15

16

17

18

19 −1.0 0.0 1.0

AFAB0522

1

2

3

456

7

8

9

10

11

12

1314 15

16

17

18

19 −1.0 0.0 1.0

AFAB0525

1

2

3

456

7

8

9

10

11

12

1314 15

16

17

18

19 −1.0 0.0 1.0

MADU0513

1

2

3

456

7

8

9

10

11

12

1314 15

16

17

18

19 −1.0 0.0 1.0

MADU0514

1

2

3

456

7

8

9

10

11

12

1314 15

16

17

18

19 −1.0 0.0 1.0

HCSF0517

1

2

3

456

7

8

9

10

11

12

1314 15

16

17

18

19 −1.0 0.0 1.0

HCSF0601

Fig. 15. Patterns constituted by the characteristic features for each data in the ASHRAE-1312-RP Spring 2008 dataset.



1

2

3

4

5

6

7

−1.0 0.0 1.0

NOR0129NOR0216NOR0217

1

2

3

4

5

6

7

−1.0 0.0 1.0

OADS0212

1

2

3

4

5

6

7

−1.0 0.0 1.0

OADL0213

1

2

3

4

5

6

7

−1.0 0.0 1.0

OADL0215

1

2

3

4

5

6

7

−1.0 0.0 1.0

EADS0202

1

2

3

4

5

6

7

−1.0 0.0 1.0

EADS0203

1

2

3

4

5

6

7

−1.0 0.0 1.0

CHW0210

1

2

3

4

5

6

7

−1.0 0.0 1.0

CHW0211

1

2

3

4

5

6

7

−1.0 0.0 1.0

HCF0205

1

2

3

4

5

6

7

−1.0 0.0 1.0

HCF0206

1

2

3

4

5

6

7

−1.0 0.0 1.0

HCRC0207

1

2

3

4

5

6

7

−1.0 0.0 1.0

HCRC0208

1

2

3

4

5

6

7

−1.0 0.0 1.0

HCRC0209

res fo

tfprTSmd

TN

864

865

866

867

868

869

870

871

872

873

874

875

876

877

878

879

880

881

Fig. 16. Patterns constituted by the characteristic featu

hese faults seem to have no effects on the manually selectedeatures. Similar cases are seen with WEAC K-means and com-lete linkage. Using features from ERCE allows NARX-TDNN toeach a significantly higher specificity average of 98.37% ± 0.25%.he significance of the results are statistically validated on bothummer 2007 and Spring 2008 datasets, where signals exhibitore nonlinearities compared to those in the Winter 2008



ataset.Regarding the general performance of the classifiers, results in

able 4 show the comparative performance between HMM andARX-TDNN. While HMM shows superior specificity in Winter

r each data in the ASHRAE-1312 Winter 2008 dataset.

2008 dataset, its specificity in Spring 2008 and Summer 2007is relatively not as high. This is arguably due to the nonlin-earities in the fault patterns in Spring 2008 and Summer 2007datasets compared to Winter 2008 faults. For instance, it canbe seen in Fig. 15 that MADU, AFAB and HCSF faults exhibitvisually ambiguous patterns. When dealing with these nonlineardatasets, the NARX-TDNN classifier benefits from its capabil-


ity in dealing with long-term dependencies. Table 4 shows thatNARX-TDNN was capable in distinguishing these faults, achiev-ing specificity of 98.37% ± 0.25% using the features provided byERCE.

882

883

884

885



Table 3NARX-TDNN classification result.

Fault type Feature selection method

Manual selectiona ERCEb

Sensitivity Specificity Sensitivity Specificity

Summer 2007

NOR 99.9% ± 0.1% 98.1% ± 1.6% 99.9% ± 0.2% 99.0% ± 2.1%EADS 99.7% ± 0.5% 99.5% ± 2.7% 99.8% ± 0.3% 98.9% ± 2.5%RFF 99.9% ± 0.0% 99.0% ± 2.7% 99.9% ± 0.1% 99.5% ± 1.4%CHWC 99.9% ± 0.2% 99.0% ± 1.1% 99.8% ± 0.2% 99.0% ± 4.4%OADS 99.9% ± 0.2% 98.0% ± 2.2% 99.9% ± 0.3% 97.3% ± 3.1%CHWV 99.8% ± 0.3% 99.0% ± 4.3% 99.7% ± 0.9% 99.2% ± 2.5%HCL 99.7% ± 0.4% 98.0% ± 1.0% 99.7% ± 0.3% 98.4% ± 2.4%OADL 99.7% ± 0.5% * 95.2% ± 7.1% 99.9% ± 0.2% 98.0% ± 1.2%AHUL 99.8% ± 0.2% 99.8% ± 1.1% 99.9% ± 0.1% 99.5% ± 2.6%Weighted average 99.8% ± 0.1% *96.8% ± 2.2% 99.8% ± 0.1% 98.4% ± 0.7%

Spring 2008

NOR 99.8% ± 0.3% 99.3% ± 2.1% 99.9% ± 0.1% 99.6% ± 0.6%OASB 99.1% ± 1.5% *95.0% ± 6.1% 99.7% ± 0.3% 99.5% ± 1.4%OADS 99.9% ± 0.2% *98.2% ± 1.7% 99.8% ± 0.1% 99.5% ± 0.9%EADS 99.9% ± 0.1% *98.3% ± 0.5% 99.9% ± 0.1% 99.0% ± 2.8%CHW 99.7% ± 0.4% *98.7% ± 0.8% 99.8% ± 0.2% 99.3% ± 0.7%RFF 99.9% ± 0.2% *82.6% ± 33.1% 99.8% ± 0.1% 99.4% ± 0.7%AFAB 99.7% ± 0.2% *42.9% ± 17.8% 99.7% ± 0.2% 98.5% ± 4.9%MADU 98.6% ± 1.6% *70.4% ± 39.8% 98.9% ± 0.2% 98.0% ± 4.0%HCSF 99.6% ± 0.6% *94.7% ± 6.6% 99.9% ± 0.0% 99.5% ± 1.5%Weighted average 98.9% ± 0.2% *86.2% ± 5.0% 99.9% ± 0.1% 99.2% ± 0.5%

Winter 2008

NOR 99.6% ± 0.4% 99.3% ± 1.1% 99.8% ± 0.1% 98.3% ± 2.4%OADS 99.9% ± 0.1% *95.6% ± 3.8% 99.8% ± 0.2% 98.7% ± 1.4%OADL 99.8% ± 0.4% 98.5% ± 3.2% 99.5% ± 0.7% 98.5% ± 1.5%EADS 99.9% ± 0.4% 97.9% ± 1.3% 99.6% ± 0.3% 97.5% ± 2.5%CHW 99.8% ± 0.4% *97.5% ± 5.2% 99.6% ± 0.3% 99.1% ± 1.2%HCF 99.8% ± 0.4% *95.1% ± 4.5% 99.2% ± 0.7% 97.2% ± 2.9%HCRC 99.8% ± 0.4% 99.0% ± 2.2% 99.8% ± 0.3% 99.4% ± 1.1%Weighted average 99.7% ± 0.2% 97.5% ± 0.7% 99.8% ± 0.1% 98.7% ± 0.7%

H∗0 : The performance of NARX-TDNN using features from ERCE is not significantly better than using manually selected features.a Manual selection utilizes Shun Li’s feature set [20].b ERCE features are as shown in Fig. 14–16.* Reject H∗0 ( = 0.001).

Table 4Performance comparison with competing feature selection methods, tested against two classification methods: NARX-TDNN and HMM.

Feature selection # of features HMM NARX-TDNN

WA sensitivity WA specificity WA sensitivity WA specificity

Summer 2007

Manual selectiona 16 ± 0.00 *98.65% ± 0.34% 89.45% ± 2.48% †99.59% ± 0.12% †96.81% ± 1.99%EAC K-means 29.85 ± 17.26 *98.70% ± 0.50% *85.01% ± 4.94% †99.69% ± 0.22% *,†95.07% ± 3.75%WEAC K-means 14.14 ± 13.09 *97.69% ± 0.13% *72.85% ± 1.48% †99.79% ± 0.08% *,†96.85% ± 2.31%Complete linkage 81.00 ± 0.00 98.71% ± 0.98% 90.49% ± 7.52% †99.51% ± 0.27% †96.42% ± 1.16%ERCE 21.41 ± 4.46 99.15% ± 0.32% 90.85% ± 4.16% †99.69% ± 0.08% †97.61% ± 0.85%

Spring 2008

Manual selectiona 16 ± 0.00 98.90% ± 0.54% †91.54% ± 2.98% *98.89% ± 0.23% *86.17% ± 5.01%EAC K-means 34.56 ± 9.40 98.55% ± 0.42% 91.85% ± 2.68% *,†99.02% ± 0.81% *91.92% ± 6.42%WEAC K-means 33.52 ± 10.32 98.83% ± 0.40% 93.37% ± 2.38% †99.20% ± 0.49% *92.37% ± 6.53%Complete linkage 84 ± 0.00 98.80% ± 0.46% 94.12% ± 2.61% †99.62% ± 0.17% *95.14% ± 1.29%ERCE 19.93 ± 5.19 98.84% ± 0.32% 92.68% ± 2.66% † 99.79% ± 0.10% † 98.37% ± 0.25%

Winter 2008

Manual selectiona 16 ± 0.00 98.81% ± 0.56% *92.92% ± 0.31% †99.71% ± 0.15% †97.51% ± 0.65%EAC K-means 27.74 ± 7.18 †99.98% ± 0.14% †99.85% ± 0.85% 99.49% ± 0.50% 97.87% ± 2.06%WEAC K-means 21.37 ± 11.75 †99.96% ± 0.18% 99.79% ± 1.00% 99.59% ± 0.19% 97.68% ± 0.88%Complete linkage 95 ± 0.00 99.87% ± 0.40% 99.21% ± 2.37% 99.74% ± 0.13% 98.54% ± 1.01%ERCE 7.88 ± 3.02 99.92% ± 0.31% 99.49% ± 1.43% 99.73% ± 0.19% 98.35% ± 1.16%

H∗0: The performance of a classifier using features from ERCE is not significantly better than using features from algorithm X. H†0: Given the same feature selection algorithm,

a o class

8

oth

886

887

888

889

890

891

892

trained classifier A does not exercise significantly better performance compared ta Manual selection utilizes Shun Li’s feature set [20].* Reject H∗0 ( = 0.001).† Reject H†

0 ( = 0.001).

. Conclusion



A method for automating feature selection and classificationf faults for Heating Ventilation and Air-Conditioning (HVAC) sys-ems using a knowledge-discovery and Neural-Network approachas been proposed. The core of the method is the Ensemble Rapid

ifier B.

Centroid Estimation (ERCE) which automatically finds characteris-tic features and discards redundant features. Using these character-


istic features, a Parallel Nonlinear Auto-Regressive Neural Networkwith eXogenous inputs and distributed time delays (NARX-TDNN)is then trained to identify the faults described in ASHRAE-1312-RPSummer 2007, Spring 2008, and Winter 2008 datasets.

893

894

895

896

INA

2 oft Com

tgtN(lsnra

eiwHaaNAac

impfc

A

IAfbAVpitcarip

R

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

1008

1009

1010

1011

1012

1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026

1027

1028

1029

1030

1031

1032

1033

1034

1035

1036



The performance of the proposed unsupervised fea-ure selection algorithm (ERCE Median NMI = 0.019 ± 0.004)enerally outperformed the conventional consensus clus-ering including Evidence Accumulation K-means (MedianMI = 0.040 ± 0.011), Weighted Evidence Accumulation K-means

Median NMI = 0.048 ± 0.034), and the conventional completeinkage clustering (Median NMI = 0.1305). ERCE also had smallertandard deviations on all performance aspects, especially on theumber of features, suggesting the relatively high reliability andepeatability of the proposed swarm-based consensus clusteringlgorithm.

The proposed feature selection method was tested on thexperimental fault data from the ASHRAE-1312-RP datasets includ-ng Summer 2007, Spring 2008, and Winter 2008 using two

ell-established time-domain classifiers: (a) NARX-TDNN; and (b)idden Markov Models (HMM). Satisfactory results were reportednd summarized. Our experimental results showed weighted aver-ge sensitivity and specificity of: (a) higher than 99% and 96% forARX-TDNN, and; (b) higher than 98% and 86% for HMM on theSHRAE-1312-RP datasets. The proposed feature selection methodppears to have positive effect in improving the generalizationapability of both AFDD algorithms based on our experiment.

Notwithstanding the satisfactory result to date, further works necessary to investigate the performance of the proposed

ethod on alternative HVAC systems. Future works will incor-orate semi-supervised adaptive learning capability for automaticault discovery. We are also interested in applying the proposedonsensus clustering method for other applications.

cknowledgements

This research is funded by The Commonwealth Scientific andndustrial Research Organisation (CSIRO), Marsfield, Australia. TheSHRAE-1312-RP Summer 2007, Spring 2008, and Winter 2008

ault data are provided by CSIRO. The research is supervisedy CSIRO, the paper writing is supervised specifically by Guo.utomatic Fault Detection and Diagnosis (AFDD) for the Heatingentilation and Air Conditioning (HVAC) research is an ongoingroject in CSIRO Energy Technology and Computational Informat-

cs. We acknowledge the inputs of the anonymous reviewers forhe time and effort in providing our paper comprehensive qualityriticisms. The corresponding author would also like to personallycknowledge Nina Elita for her contribution, especially in proofeading and provision of sincere moral support to the correspond-ng author during the preparation, writing and submission of thisaper.

eferences

[1] A. Kusiak, M. Li, F. Tang, Modeling and optimization of {HVAC} energy con-sumption, Appl. Energy 87 (2010) 3092–3102.

[2] A. Kusiak, F. Tang, G. Xu, Multi-objective optimization of {HVAC} system withan evolutionary computation algorithm, Energy 36 (2011) 2440–2449.

[3] J. Wall, Automatic Fault Detection and Diagnosis, 2011 http://www.csiro.au/Outcomes/Energy/building-fault-detection.aspx

[4] J. Ward, Opticool, 2013 http://www.csiro.au/Organisation-Structure/Flagships/Energy-Flagship/Opticool.aspx

[5] J. Liang, R. Du, Model-based fault detection and diagnosis of HVAC systemsusing support vector machine method, Int. J. Refrig. 30 (2007) 1104–1114.

[6] D. Jacob, S. Dietz, S. Komhard, C. Neumann, S. Herkel, Black-box models for faultdetection and performance monitoring of buildings, J. Build. Perform. Simul. 3(2010) 53–62.



[7] C. Lo, P. Chan, Y.-K. Wong, A.B. Rad, K. Cheung, Fuzzy-genetic algorithm for auto-matic fault detection in HVAC systems, Appl. Soft Comput. 7 (2007) 554–560.

[8] J. Schein, S.T. Bushby, N.S. Castro, J.M. House, A rule-based fault detectionmethod for air handling units, Energy Build. 38 (2006) 1485–1492.

[9] T.M. Rossi, J.E. Braun, A statistical, rule-based fault detection and diagnosticmethod for vapor compression air conditioners, HVAC&R Res. 3 (1997) 19–37.

[

[


10] J. Schein, Results from Field Testing of Embedded Air Handling Unit and VariableAir Volume Box Fault Detection Tools, U.S. Dept. of Commerce, TechnologyAdministration, National Institute of Standards and Technology, 2006.

11] J. Wall, Y. Guo, J. Li, S. West, A dynamic machine learning-based tech-nique for automated fault detection in HVAC systems, in: Proceedings ofthe ASHRAE Annual Conference, Montreal, Quebec, Canada, 2011, 2011,pp. 449–456.

12] Y. Guo, D. Dehestani, J. Li, J. Wall, S. West, S. Su, Intelligent outlier detection forHVAC system fault detection, in: Proceedings of the 10th International HealthyBuildings Conference, Brisbane, Queensland, Australia, 2012, 2012.

13] Y. Guo, J. Wall, J. Li, S. West, Intelligent model based fault detection anddiagnosis for HVAC system using statistical machine learning methods, in:Proceedings of the ASHRAE 2013 Winter Conference, Dallas, USA, 2013, 2013.

14] M. Yuwono, S.W. Su, Y. Guo, J. Li, S. West, J. Wall, Automatic feature selectionusing multiobjective cluster optimization for fault detection in a heating venti-lation and air conditioning system, in: Proceedings of the 2013 1st InternationalConference on Artificial Intelligence, Modelling and Simulation, AIMS ’13, IEEEComputer Society, Washington, DC, USA, 2013, 2013, pp. 171–176, http://dx.doi.org/10.1109/AIMS.2013.34

15] W. Deng, X. Yang, L. Zou, M. Wang, Y. Liu, Y. Li, An improved self-adaptivedifferential evolution algorithm and its application, Chemometr. Intell. Lab.Syst. 128 (2013) 66–76, http://dx.doi.org/10.1016/j.chemolab.2013.07.004

16] L. Wang, C.-X. Dun, W.-J. Bi, Y.-R. Zeng, An effective and efficient differen-tial evolution algorithm for the integrated stochastic joint replenishment anddelivery model, Knowl.-Based Syst. 36 (2012) 104–114, http://dx.doi.org/10.1016/j.knosys.2012.06.007

17] M. Yuwono, S. Su, B. Moulton, H. Nguyen, Data clustering using variants of rapidcentroid estimation, IEEE Trans. Evol. Comput. 18 (2013) 366–377.

18] M. Yuwono, S. Su, B. Moulton, H. Nguyen, An algorithm for scalable clustering:ensemble rapid centroid estimation, in: Proceedings of the 2014 IEEE Congresson Evolutionary Computation, 2014, 2014, pp. 1250–1257.

19] D.W. van der Merwe, A.P. Engelbrecht, Data clustering using particle swarmoptimization, in: Proceedings of the 2003 IEEE Congress on Evolutionary Com-putation, 2003, vol. 1, 2003, 2003, pp. 215–220.

20] S. Li, A Model-Based Fault Detection and Diagnostic Methodology for SecondaryHVAC Systems (Ph.D. thesis), Drexel University, 2014.

21] S. Kullback, R.A. Leibler, On information and sufficiency, Ann. Math. Stat. 22(1951) 79–86, http://dx.doi.org/10.1214/aoms/1177729694

22] S. Monti, P. Tamayo, J. Mesirov, T. Golub, Consensus clustering: A resampling-based method for class discovery and visualization of gene expressionmicroarray data, Mach. Learn. 52 (2003) 91–118, http://dx.doi.org/10.1023/A:1023949509487

23] M.D. Wilkerson, D.N. Hayes, ConsensusClusterPlus: a class discovery toolwith confidence assessments and item tracking, Bioinformatics 26 (2010)1572–1573.

24] D.N. Hayes, S. Monti, G. Parmigiani, C.B. Gilks, K. Naoki, A. Bhattacharjee,M.A. Socinski, C. Perou, M. Meyerson, Gene expression profiling reveals repro-ducible human lung adenocarcinoma subtypes in multiple independent patientcohorts, J. Clin. Oncol. 24 (2006) 5079–5090.

25] A. Fred, A. Jain, Combining multiple clusterings using evidence accumulation,IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005) 835–850, http://dx.doi.org/10.1109/TPAMI.2005.113

26] A. Strehl, J. Ghosh, Cluster ensembles – a knowledge reuse framework for com-bining multiple partitions, J. Mach. Learn. Res. 3 (2003) 583–617, http://dx.doi.org/10.1162/153244303321897735

27] I.J. Leontaritis, S.A. Billings, Input–output parametric models for non-linearsystems. Part I: Deterministic non-linear systems, Int. J. Control 41 (1985)303–328, http://dx.doi.org/10.1080/0020718508961129

28] H. Siegelmann, B. Horne, C. Giles, Computational capabilities of recurrent NARXneural networks, IEEE Trans. Syst. Man Cybern. Part B: Cybern. 27 (1997)208–215, http://dx.doi.org/10.1109/3477.558801

29] J.M. Menezes Jr., G. Barreto, A new look at nonlinear time series predictionwith NARX recurrent neural network, in: Ninth Brazilian Symposium on NeuralNetworks, 2006. SBRN ’06, 2006, pp. 160–165, http://dx.doi.org/10.1109/SBRN.2006.7

30] T. Wang, Comparing hard and fuzzy C-means for evidence-accumulation clus-tering, in: Proceedings of the 18th International Conference on Fuzzy Systems,FUZZ-IEEE’09, IEEE Press, Piscataway, NJ, USA, 2009, 2009, pp. 468–473.

31] F. Duarte, A.L.N. Fred, A. Lourenco, M. Rodrigues, Weighting cluster ensemblesin evidence accumulation clustering, in: Portuguese Conference on ArtificialIntelligence, 2005. EPIA 2005, 2005, pp. 159–167, http://dx.doi.org/10.1109/EPIA.2005.341287

32] M. Yuwono, S.W. Su, B.D. Moulton, H.T. Nguyen, Fast unsupervised learningmethod for rapid estimation of cluster centroids, in: Proceedings of the 2012IEEE Congress on Evolutionary Computation, 2012, 2012, pp. 889–896.

33] J.C. Bezdek, Mathematical models for systematic and taxonomy, in: G.Estabrook (Ed.), Proceedings of the 8th International Conference on Numerical


Taxonomy, Freeman, San Francisco, CA, 1975, 1975, pp. 143–166.34] T. Wang, Ca-tree: a hierarchical structure for efficient and scalable

coassociation-based cluster ensembles, IEEE Trans. Syst. Man Cybern. Part B:Cybern. 41 (2011) 686–698, http://dx.doi.org/10.1109/TSMCB.2010.2086059

35] P.J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validationof cluster analysis, J. Comput. Appl. Math. 20 (1987) 53–65.

1037

1038

1039

1040

1041

1042

HVAC_CSIRO_Proof_2015

Documents

manual selection median

diagnosis heating ventilation

automatic fault detection

feature selection method

australia b

australia c

experimental hvac fault

complete linkage median