Top Banner
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 16, NO. 10, OCTOBER 2020 6291 A Unified Probabilistic Monitoring Framework for Multimode Processes Based on Probabilistic Linear Discriminant Analysis Yi Liu , Jiusun Zeng , Jie Bao, and Lei Xie AbstractThis article develops a novel probabilistic monitoring framework for industrial processes with multi- ple operational conditions. The proposed method is based on the probabilistic linear discriminant analysis (PLDA), which relies on two sets of latent variables, i.e., the between-class and within-class latent variables. In order to deal with the large within-class variations in multi-mode industrial processes, this approach modifies the original PLDA by introducing a separate within-class loading ma- trix for each operational mode and designs an expectation maximization (EM) algorithm to estimate the model param- eters from the training samples. Mode identification for test samples is achieved by investigating the cosine similar- ity in the between-class latent variables and two monitor- ing statistics corresponding to within-class latent variables and the residuals are considered for fault detection. To diagnose the process fault, this article further develops a sparse probabilistic generative model based on PLDA for fault isolation. The enhanced performance of the proposed method is illustrated by applications to numerical examples and industrial processes. Index TermsFault isolation, latent variables, multi- mode process monitoring, probabilistic linear discriminant analysis (PLDA). I. INTRODUCTION T O ACHIEVE improved system reliability and operational safety, more and more attention has been paid on the Manuscript received July 9, 2019; revised September 23, 2019 and December 20, 2019; accepted January 10, 2020. Date of publication January 15, 2020; date of current version June 22, 2020. This work was supported in part by the National Key Research and Development Program of China under Grant 2018YFF0214700, in part by the Nat- ural Science Foundation of China under Grant 61673358 and Grant 61621002, and in part by the Natural Science Foundation of Zhejiang Province under Grant LR17F030002. Paper no. TII-19-3078. (Corre- sponding author: Jiusun Zeng.) Y. Liu is with the School of Information Science and Engineer- ing, Hangzhou Normal University, Hangzhou 311121, China, and also with the Institute of Cyber Systems and Control, Zhejiang University, Hangzhou 310027, China (e-mail: [email protected]). J. Zeng is with the College of Metrology and Measurement Engi- neering, China Jiliang University, Hangzhou 310018, China (e-mail: [email protected]). J. Bao is with the School of Chemical Engineering, The Univer- sity of New South Wales, Sydney, NSW 2052, Australia (e-mail: [email protected]). L. Xie is with the Institute of Cyber Systems and Control, Zhejiang University, Hangzhou 310027, China (e-mail: [email protected]). Color versions of one or more of the figures in this article are available online at https://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TII.2020.2966707 development of process monitoring methods. Thanks to the advancement of information technology, process monitoring methods, including data-driven techniques and model-based approaches, have been extensively studied over the past few decades [1]. In contrast to model-based approaches, data-driven techniques do not require much prior knowledge and have shown to be conceptually simple and with low implementation cost. One type of data-driven techniques is based on the multi- variate statistical process control (MSPC) [2]. The basic idea of MSPC methods is to obtain a set of latent variables by mapping the process data onto a lower dimensional subspace based on certain criteria. Once the latent variables are obtained, monitoring statistics like Hotelling’s T 2 and squared prediction error (SPE) can be constructed for fault detection and isolation. It should be noted that the training datasets for these methods are usually assumed to admit a unimodal distribution. This can be problematic as practical industrial processes often undergo frequent production shifts due to changes in product quality specifications, manufacturing strategies, and working environ- ments [3]. Thus, process monitoring models trained under the unimodal assumption may fail and produce significant number of false alarms. In order to deal with the monitoring problem of multimode processes, various methods have been developed, including adaptive methods [4], [5], similarity analysis-based methods [6], [7], subspace separation [8], [9], and mixture mod- els [10], [11]. More recently, monitoring of nonlinear processes using the kernel trick has also been considered and good results have been achieved [12], [13]. Among multimode process monitoring techniques, proba- bilistic approaches have received considerable attention [14]. Comparing to conventional methods, probabilistic approaches are advantageous in its capability of: 1) incorporating prior knowledge using appropriate distributions; 2) better handling of process uncertainty; and 3) easy accommodation of missing data and outliers [15], [16]. Various probabilistic methods has been reported in the literature, such as Gaussian mixture models [17], probabilistic principal component analysis (PPCA) [18], hidden Markov models [15], and conditional random fields [19]. They have been successful in detecting process fault and diagnosing operational modes in a number of applications. For fault isolation and diagnosis, however, possible solutions include formulating the fault isolation and diagnosis problem in a way similar to contribution analysis [20], or performing an additional root cause diagnosis using causal models like Bayesian 1551-3203 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information. Authorized licensed use limited to: Zhejiang University. Downloaded on September 15,2020 at 07:33:45 UTC from IEEE Xplore. Restrictions apply.
10

A Unified Probabilistic Monitoring Framework for Multimode ......2020/11/03  · variate statistical process control (MSPC) [2]. The basic idea of MSPC methods is to obtain a set of

Aug 24, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Unified Probabilistic Monitoring Framework for Multimode ......2020/11/03  · variate statistical process control (MSPC) [2]. The basic idea of MSPC methods is to obtain a set of

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 16, NO. 10, OCTOBER 2020 6291

A Unified Probabilistic Monitoring Framework forMultimode Processes Based on Probabilistic

Linear Discriminant AnalysisYi Liu , Jiusun Zeng , Jie Bao, and Lei Xie

Abstract—This article develops a novel probabilisticmonitoring framework for industrial processes with multi-ple operational conditions. The proposed method is basedon the probabilistic linear discriminant analysis (PLDA),which relies on two sets of latent variables, i.e., thebetween-class and within-class latent variables. In order todeal with the large within-class variations in multi-modeindustrial processes, this approach modifies the originalPLDA by introducing a separate within-class loading ma-trix for each operational mode and designs an expectationmaximization (EM) algorithm to estimate the model param-eters from the training samples. Mode identification for testsamples is achieved by investigating the cosine similar-ity in the between-class latent variables and two monitor-ing statistics corresponding to within-class latent variablesand the residuals are considered for fault detection. Todiagnose the process fault, this article further develops asparse probabilistic generative model based on PLDA forfault isolation. The enhanced performance of the proposedmethod is illustrated by applications to numerical examplesand industrial processes.

Index Terms—Fault isolation, latent variables, multi-mode process monitoring, probabilistic linear discriminantanalysis (PLDA).

I. INTRODUCTION

TO ACHIEVE improved system reliability and operationalsafety, more and more attention has been paid on the

Manuscript received July 9, 2019; revised September 23, 2019 andDecember 20, 2019; accepted January 10, 2020. Date of publicationJanuary 15, 2020; date of current version June 22, 2020. This workwas supported in part by the National Key Research and DevelopmentProgram of China under Grant 2018YFF0214700, in part by the Nat-ural Science Foundation of China under Grant 61673358 and Grant61621002, and in part by the Natural Science Foundation of ZhejiangProvince under Grant LR17F030002. Paper no. TII-19-3078. (Corre-sponding author: Jiusun Zeng.)

Y. Liu is with the School of Information Science and Engineer-ing, Hangzhou Normal University, Hangzhou 311121, China, and alsowith the Institute of Cyber Systems and Control, Zhejiang University,Hangzhou 310027, China (e-mail: [email protected]).

J. Zeng is with the College of Metrology and Measurement Engi-neering, China Jiliang University, Hangzhou 310018, China (e-mail:[email protected]).

J. Bao is with the School of Chemical Engineering, The Univer-sity of New South Wales, Sydney, NSW 2052, Australia (e-mail:[email protected]).

L. Xie is with the Institute of Cyber Systems and Control, ZhejiangUniversity, Hangzhou 310027, China (e-mail: [email protected]).

Color versions of one or more of the figures in this article are availableonline at https://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TII.2020.2966707

development of process monitoring methods. Thanks to theadvancement of information technology, process monitoringmethods, including data-driven techniques and model-basedapproaches, have been extensively studied over the past fewdecades [1]. In contrast to model-based approaches, data-driventechniques do not require much prior knowledge and have shownto be conceptually simple and with low implementation cost.

One type of data-driven techniques is based on the multi-variate statistical process control (MSPC) [2]. The basic ideaof MSPC methods is to obtain a set of latent variables bymapping the process data onto a lower dimensional subspacebased on certain criteria. Once the latent variables are obtained,monitoring statistics like Hotelling’s T 2 and squared predictionerror (SPE) can be constructed for fault detection and isolation.It should be noted that the training datasets for these methodsare usually assumed to admit a unimodal distribution. This canbe problematic as practical industrial processes often undergofrequent production shifts due to changes in product qualityspecifications, manufacturing strategies, and working environ-ments [3]. Thus, process monitoring models trained under theunimodal assumption may fail and produce significant numberof false alarms. In order to deal with the monitoring problemof multimode processes, various methods have been developed,including adaptive methods [4], [5], similarity analysis-basedmethods [6], [7], subspace separation [8], [9], and mixture mod-els [10], [11]. More recently, monitoring of nonlinear processesusing the kernel trick has also been considered and good resultshave been achieved [12], [13].

Among multimode process monitoring techniques, proba-bilistic approaches have received considerable attention [14].Comparing to conventional methods, probabilistic approachesare advantageous in its capability of: 1) incorporating priorknowledge using appropriate distributions; 2) better handling ofprocess uncertainty; and 3) easy accommodation of missing dataand outliers [15], [16]. Various probabilistic methods has beenreported in the literature, such as Gaussian mixture models [17],probabilistic principal component analysis (PPCA) [18], hiddenMarkov models [15], and conditional random fields [19]. Theyhave been successful in detecting process fault and diagnosingoperational modes in a number of applications. For faultisolation and diagnosis, however, possible solutions includeformulating the fault isolation and diagnosis problem in a waysimilar to contribution analysis [20], or performing an additionalroot cause diagnosis using causal models like Bayesian

1551-3203 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Zhejiang University. Downloaded on September 15,2020 at 07:33:45 UTC from IEEE Xplore. Restrictions apply.

Page 2: A Unified Probabilistic Monitoring Framework for Multimode ......2020/11/03  · variate statistical process control (MSPC) [2]. The basic idea of MSPC methods is to obtain a set of

6292 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 16, NO. 10, OCTOBER 2020

networks [21] or transfer entropy [22]. Several probabilisticmethods like probabilistic contribution analysis [23] and prob-abilistic reconstruction [24] have also been proposed. Despitethe research progress on probabilistic monitoring methods, itstill lacks a unified probabilistic framework which combinesthe fault detection and isolation for multimode processes.

In this article, a novel probabilistic monitoring framework isproposed based on the probabilistic linear discriminant analy-sis [25]. However, the standard probabilistic linear discriminantanalysis (PLDA) is not suitable for handling datasets with sig-nificantly large within-class variance [26], which is commonin multimode industrial processes. In order to accommodatemultimode industrial processes, this article introduces a separatewithin-class loading matrix for each operational mode and de-signs two monitoring statistics, leading to better fault detectionresults. For mode identification, a cosine similarity is developedbased on the between-class latent variables. In addition, a sparseprobabilistic generative model (SPGM) based on PLDA is devel-oped to isolate faulty variables. The contribution of this articlecan be summarized as follows: 1) A unified probabilistic moni-toring framework is developed for fault detection and isolationfor multimode processes. 2) The standard PLDA is extended byintroducing a separate within-class loading matrix for each modeto accommodate the dataset with large within-class variance,leading to better fault detection results. 3) A SPGM based onPLDA is proposed to isolate faulty variables.

II. PROBABILISTIC LINEAR DISCRIMINANT ANALYSIS

In this section, the standard PLDA model [25], a probabilisticversion of linear discriminant analysis (LDA), is briefly intro-duced. In PLDA, the jth sample in the ith class, xij ∈ Rm,can be described by two sets of latent variables with Gaussianpriors. The corresponding probabilistic generative model can begiven as

xij = μ+ Fhi +Gwij + εij

hi ∼ N (0, I)

wij ∼ N (0, I)

εij ∼ N (0,Σ) (1)

where 0 and I represent the zero vector and the identity matrixwith appropriate dimensions,μ = 1

N

∑ij xij is the global mean

value vector, withN being the size of the whole dataset.N (· , ·)denotes the Gaussian distribution with the selected parameters.The assumption of Gaussian distribution is based on the factthat the data in a specific mode of multimode processes can beregarded as Gaussian or approximately Gaussian. Such assump-tion has also been used in a series of research work based onprobabilistic models [15], [18]. hi ∈ RDF is the between-classlatent variable shared by all the samples generated from the ithclass and wij ∈ RDG is the within-class latent variable whichexplains the sample variation. F ∈ Rm×DF and G ∈ Rm×DG

are two low-rank loading matrices. εij is the stochastic Gaussiannoise with zero mean and a diagonal heteroscedastic covariancematrix Σ. The model parameters of PLDA can be grouped asθ = {μ,F,G,Σ}. An expectation maximization (EM) iteration

procedure was introduced to estimate the model parameters in[25]. Comparing to PPCA, PLDA has better flexibility and is ableto make probabilistic inferences about the class/mode identity.

III. IMPROVED PROBABILISTIC LINEAR

DISCRIMINANT ANALYSIS

As is discussed in Section I, the standard PLDA is not suitablefor dealing with data with large within-class variations, whichis common in industrial multimode processes. To cope with thisproblem, in this section, an improved PLDA (I-PLDA) model isconsidered.

A. Model Structure

To better deal with multimode processes, the standard PLDAis modified by replacing the within-class loading matrix G byGi for i = 1, . . . , I , with each Gi corresponding to a specificprocess mode/class. Hence, the generative model of I-PLDA cannow be written as

xij = μ+ Fhi +Giwij + εij . (2)

Note that in (2), by integrating out the latent variable wij , thewithin-class variances are now obtained as Si = GiGi +Σ, incontrast to Si = GG+Σ of PLDA, which is the same for allclasses. This renders I-PLDA better flexibility, as it introduces aspecific variance for data in each mode. Now the parameter setof I-PLDA is extended as θ = {μ,F,Gi,Σ, i = 1, . . . , I}. Theabove treatment introduces significant flexibility to I-PLDA andleads to improved effectiveness in handling industrial multimodeprocesses.

B. Model Estimation

Assume a dataset X = X1⋃X2

⋃ · · ·⋃XI has been col-lected, with Xi = {xij}Jj=1 storing all J samples belongingto the ith class, so that the size of X becomes N = IJ . Inorder to learn the parameter set θ for I-PLDA, an iterative EMalgorithm is developed. The EM algorithm is an optimizationmethod particularly suitable for problems with some variablesunobservable, but whose probability distributions are known.It is a powerful and widely used tool for estimation of latentvariable models [14]. The EM algorithm mainly consists of anE-step and a M-step as follows [27].

Part I. E-step: To facilitate the joint inference of the latentvariables, the probabilistic generative model for the dataset Xi

can be written as

⎢⎢⎢⎢⎣

xi1

xi2

...

xiJ

⎥⎥⎥⎥⎦

︸ ︷︷ ︸Xi

=

⎢⎢⎢⎢⎣

μ

μ

...

μ

⎥⎥⎥⎥⎦

︸ ︷︷ ︸μi

+

⎢⎢⎢⎢⎢⎣

F Gi 0 · · · 0

F 0 Gi · · ·...

......

.... . . 0

F 0 0 · · · Gi

⎥⎥⎥⎥⎥⎦

︸ ︷︷ ︸Ai

⎢⎢⎢⎢⎢⎢⎢⎣

hi

wi1

wi2

...

wiJ

⎥⎥⎥⎥⎥⎥⎥⎦

︸ ︷︷ ︸yi

+

⎢⎢⎢⎢⎣

εi1

εi2...

εiJ

⎥⎥⎥⎥⎦

︸ ︷︷ ︸εi

.

(3)

Here Xi is the data matrix storing all samples belonging toclass i, μi and Ai are the concatenated mean and parameter

Authorized licensed use limited to: Zhejiang University. Downloaded on September 15,2020 at 07:33:45 UTC from IEEE Xplore. Restrictions apply.

Page 3: A Unified Probabilistic Monitoring Framework for Multimode ......2020/11/03  · variate statistical process control (MSPC) [2]. The basic idea of MSPC methods is to obtain a set of

LIU et al.: UNIFIED PROBABILISTIC MONITORING FRAMEWORK FOR MULTIMODE PROCESSES BASED ON PLDA 6293

matrices. According to (1), both the concatenated hidden vari-able matrix yi and the concatenated noise matrix εi follow aGaussian distribution, with yi ∼ N (0, I) and εi ∼ N (0, Σ).Σ is a block diagonal matrix with each diagonal block beingΣ. With the distributions of hidden variable and noise termsdetermined, the parameters in (3) can be estimated using theEM algorithm. According to [25], the posterior expectation ofyi = [hT

i wTi1 · · · wiJi ] is obtained by

〈yi〉 = ΞiATi Σ

−1(Xi − μ

)

Ξi =(AT

i Σ−1Ai + I

)−1(4)

where 〈 · 〉 denotes the expectation of a latent variable. Ξi isthe posterior covariance of Σ. Note that the matrix inversion in(4) is usually intractable due to the high dimension. Fortunately,this problem can be efficiently solved by the partitioned inverseformula [28] as

Ξi =

[Ai Bi

Ci Di

]−1

=

[Ui −UiBiD

−1i

−D−1i CiUi D−1

i CiUiBiD−1i +D−1

i

]

(5)

where Ui, Bi, Ci, and Di are obtained from (3) and (4) asfollows:

Ui =(I+ JFTQiF

)−1

Bi =[FTΣ−1Gi · · · FTΣ−1Gi

]

Di = diag([

Vi · · · Vi

])(6)

with Vi = (I+GTi Σ

−1Gi)−1, Qi = (Σ+GiG

Ti )

−1, andCi = BT

i . Based on the formulas of (4)–(6), the posterior ex-pectations of the between- and within-class latent variables canbe updated by

〈hi〉 = UiFTQi

J∑

j=1

(xij − μ) (7)

〈wij〉 = ViGTi Σ

−1 (xij − μ− F 〈hi〉) . (8)

The second-order moments of these two latent variables areestimated by

⟨hih

Ti

⟩= 〈hi〉

⟨hTi

⟩+ cov (hi,hi) (9)

⟨wijw

Tij

⟩= 〈wij〉

⟨wT

ij

⟩+ cov (wij ,wij) (10)

⟨hiw

Tij

⟩= 〈hi〉

⟨wT

ij

⟩+ cov (hi,wij) (11)

where “cov” is the covariance function and

cov (hi,hi) = Ui,

cov (hi,wij) = −UiFTΣ−1GiVi, and

cov (wij ,wij) = ViGTi Σ

−1FUiFTΣ−1GiVi +Vi.

Part II: M-step: For the purpose of parameter updating giventhe observed dataset and the posterior expectations, the EM aux-iliary function from the complete log-likelihood is first extracted

as

Q(Θ,Θold) = 〈ln p (X |Θold,H )〉q(H|X,Θ. )

≡ −12

I∑

i=1

⎧⎨

⎩J(tr(Σ−1Γi

)+ln |Σ|)+

J∑

j=1

⟨xTij

⟩Σ−1 〈xij〉

⎫⎬

(12)

where q( · ) denotes the posterior distribution, ≡ indicatesequality up to an additive constant, and H the collectionof all latent variables. The other terms can be expandedas 〈xij〉 = xij − μ− F 〈hi〉 −Gi 〈wij〉 and Γi = FUiF

T +Gicov(wij ,wij)Gi

T + 2Fcov(hi,wij)GiT .

The parameter set θ for I-PLDA is then updated by taking thecorresponding derivatives. For example, by setting the derivativewith respect to F as zero, parameter matrix F can be updated as

F =I∑

i=1

J∑

j=1

[xij

⟨hTi

⟩−Gi

⟨wijh

Ti

⟩]×[

I∑

i=1

J⟨hih

Ti

⟩]−1

(13)where xij = xij − μ and

⟨wijh

Ti

⟩=⟨hiw

Tij

⟩T. Similarly,

other parameters are updated as follows.

Gi =

J∑

j=1

[xij

⟨wT

ij

⟩− F⟨hiw

Tij

⟩]×⎡

⎣J∑

j=1

⟨wijw

Tij

⟩⎤

−1

(14)

Σ = diag

⎣ 1N

I∑

i=1

J∑

j=1

〈εij〉⟨εTij

⟩+ cov (εij , εij)

⎦ (15)

where 〈εij〉 = xij − μ− F 〈hi〉 −Gi 〈wij〉 and cov(εij ,εij) = Fcov(hi,hi)F

T + 2Fcov(hi,wij)GTi +Gicov(wij ,

wij)GTi .

Let k denote the iteration number, the full training schemebased on the joint updating EM algorithm can be summarizedin Algorithm 1.

C. Parameter Selection

Note that the dimension sizes DF and DG for latent variablesshould be determined first in Algorithm 1. Usually, the naturalconstrains for them are 1 ≤ DF ≤ m and 1 ≤ DG ≤ m, wherem is the number of process variables. Although it has been sug-gested that these two parameters should be consistent [29], 1 ≤DF ≤ m ≤ DG ≤ 2m is adopted in this article, as a smallervalue of DG which will lead to a lower dimensional multivariateLaplace prior in the subsequent fault isolation, which tends toproduce biased estimates and can be easily affected by outliersfor high dimensional data [30]. The optimal values of DF andDG can be experimentally determined using trial and error.

IV. PROCESS MONITORING STRATEGY

Until now, the estimations of model parameters and latentvariables have been obtained. In order to construct a monitoringmodel for industrial processes with multiple conditions, thissection develops a process monitoring strategy consisting of a

Authorized licensed use limited to: Zhejiang University. Downloaded on September 15,2020 at 07:33:45 UTC from IEEE Xplore. Restrictions apply.

Page 4: A Unified Probabilistic Monitoring Framework for Multimode ......2020/11/03  · variate statistical process control (MSPC) [2]. The basic idea of MSPC methods is to obtain a set of

6294 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 16, NO. 10, OCTOBER 2020

Algorithm 1: EM for I-PLDA.Input: Training dataset X, dimension sizes DF and DG.1. Randomly initialize parameters F, Gi, and Σ;2. Compute the parameters Ui, Vi, and Qi using (6);3. Update the posterior expectations of {hi,wij} and the

related second order moments using (7)–(11);4. Update parameters F, Gi, and Σ using (13)–(15);5. Calculate the likelihood of the total dataset asL(k+1) =∑

ij lnN (xij |μ+ F(k+1)〈hi〉(k+1),S(k+1)i );

6. If |L(k+1)−L(t)

L(k) | > ε, set k = k + 1 and go back to Step2, otherwise terminate the iteration and go to Step 7;

7. Output the model parameters and posteriorexpectations of all latent variables.

mode identification step, a statistics construction step, and a faultisolation step.

A. Process Mode Identification

As shown in (1), the between-class latent variable hi is usedto indicate the class label. In order to identify the mode of a testsample xt, it is important to see how the test sample interactswith hi. Note that the estimation of hi in (7) can be obtained bysumming up the contribution of J samples in mode/class i, i.e.,〈hi〉 =

∑Jj=1 〈hij〉, with 〈hij〉 being expressed as

〈hij〉 = UiFTQi (xij − μ) . (16)

Thus, if a test sample xt belongs to class/mode i, the corre-sponding contribution can be obtained by replacing xij with xt

in (16), resulting in a temporary between-sample latent variableht|i, with the expectation of

⟨ht|i

⟩= UiF

TQi (xt − μ) . (17)

On the other hand, the average contribution to 〈hi〉 from allthe samples in class/mode i can be defined as hi, with theexpectation of

⟨hi

⟩=

1J

J∑

j=1

UiFTQi (xij − μ) . (18)

Identification of the mode label of a test sample xt can beachieved by investigating the similarity between hi and hi|t.The closer hi|t is to hi, the more probable xt can be assigned

into mode/class i. To capture the similarity between hi and hi|t,the cosine similarity is adopted here as

cos(hi,hi|t

)=

⟨hTi

⟩ ⟨hi|t

∥∥∥⟨hi

⟩∥∥∥

2

∥∥⟨hi|t

⟩∥∥

2

. (19)

Consequently, the mode of xt can be identified as

s = argmaxi

cos(hi,hi|t

). (20)

B. Process Monitoring Statistics

Once the mode label of a test sample is determined, it isrequired to construct monitoring statistics for fault detection.Here, conventionalT 2 andSPE statistics are used. For I-PLDA,these two statistics are constructed as follows:

T 2 =⟨wT

t|s⟩ ⟨

wt|s⟩

(21)

SPE =⟨εTt|s

⟩ ⟨εt|s

⟩(22)

where⟨wt|s

⟩can be obtained using (8).

⟨εt|s

⟩can be derived

by the difference between the test sample xt and the relatedestimation μ+ F 〈hs〉+G

⟨wt|s

⟩. The control limits of the

two statistics can be obtained using kernel density estimation(KDE).

C. Fault Isolation Method

Once a fault is detected, it becomes important to timely andaccurately localize the faulty variables. In this subsection, aSPGM based on I-PLDA is proposed to isolate faulty variables.

1) Sparse Probabilistic Generative Model: Assume a faulthas been detected by the T 2 statistic, the following SPGM isconsidered as:

xt|s = μ+ Fhs +Gsw∗t|s

︸ ︷︷ ︸x∗t|s

+ΔGw∗t|s + εt|s

= x∗t|s +ΔGw∗

t|s + εt|s . (23)

Here xt|s indicates that xt has been assigned to mode s andx∗t|s denotes the expected normal replica of xt|s. Note that xt|s

is equal to x∗t|s and ΔG = 0 if xt is normal. Hence the faulty

components are totally incorporated into the term ΔGw∗t|s. The

reason for not using F to isolate the fault is that the term Fhi

represents the between-class variation. Once a test sample isassigned into a specific mode and determined to be faulty, thedominant variations will be focused on within-class variation.

In order to isolate faulty variables from a faulty sample xt, itis desired to obtain the estimation of matrixΔG. An appropriateassumption on ΔG is that it is a matrix with row-wise sparsity.That is to say, the nonzero rows ofΔG correspond to faulty vari-ables while the zero rows correspond to normal variables [31].Let ΔG = [ΔgT

1 ΔgT2 · · · ΔgT

k · · · ΔgTm]T , with Δgk be-

ing the kth row of ΔG. The row-sparsity assumption can beachieved by imposing a multivariate Laplace prior [30] on Δgk

Δgk |ak , ck ∼ ck exp (−√ak‖Δgk‖2) (24)

where ak and ck are unknown parameters. As pointed out byBabacan et al. [30], this prior is sharply peaked at zero vector,indicating that a row corresponding to normal variable will beshrunk to zero, whilst the rows corresponding to faulty variableswill be nonzero. It is a common practice to decompose themultivariate Laplace prior into a hierarchical prior, consisting ofa scaled Gaussian and two Gamma distributions [30]. Equations(25)– (27) show how to decompose the multivariate Laplaceprior

Δgk |zk ∼ N (0, zkI) (25)

Authorized licensed use limited to: Zhejiang University. Downloaded on September 15,2020 at 07:33:45 UTC from IEEE Xplore. Restrictions apply.

Page 5: A Unified Probabilistic Monitoring Framework for Multimode ......2020/11/03  · variate statistical process control (MSPC) [2]. The basic idea of MSPC methods is to obtain a set of

LIU et al.: UNIFIED PROBABILISTIC MONITORING FRAMEWORK FOR MULTIMODE PROCESSES BASED ON PLDA 6295

zk |ak ∼ a(DG+1)/4k z

(DG−1)/2k exp (−akzk/2)

2(DG+1)/2Ga ((DG + 1)/2)(26)

ak |λ, η ∼ Gamma (λ, η) (27)

whereGamma(ak|λ, η) = ηλaλ−1k exp(−ηak)/Ga(λ)with the

Gamma function, Ga(x) =∫∞

0 yx − 1 exp(−y)dy and {λ, η}are predetermined hyperparameters.

To get a full Bayesian inference framework for the SPGM,the priors for other parameters are given by

x∗t|s ∼ N

(μ+ Fhs +Gw∗

t|s , νI)

(28)

ν |ρ, κ ∼ Gamma (ρ, κ) (29)

w∗t|s ∼ N (0, I) (30)

εt|s |γ ∼ N (0, γI) (31)

γ |α, β ∼ Gamma (α, β) (32)

where ν and γ are the variances of x∗t|s and εt|s, respectively.

As suggested by Yang et al. [32], these noise variances can beestimated by placing different Gamma hyperpriors over ν andγ. For these Gamma hyperpriors, ρ, κ, α, β are predeterminedhyperparameters.

If a faulty sample causes violation in the SPE statistic, asimilar probabilistic generative model can be considered as

xt|s = x∗t|s + rt|s + εt|s (33)

rt|s |τ ∼ N (0, τI) (34)

τ |b ∼√b

2exp (−bτ/2) . (35)

In this case, the faulty components are incorporated in the vectorrt|s. For the purpose of fault isolation, this time, a univariateLaplace prior can be imposed on each element of rt|s. Simi-larly, the Laplace prior can be decomposed hierarchically into aGaussian distribution in (34) and a Gamma distribution in (35),with b being a predetermined parameter. Faulty variables nowcorrespond to the nonzero elements of rt|s and normal variablescorrespond to zero elements.

2) Moving Window Technique for Online Application: Fora robust fault isolation as well as to alleviate the effect ofnoise/disturbance, a moving window approach is commonlyapplied in fault isolation. Assume a window of l faulty sam-ples causing violations in the monitoring statistics have beencollected as xt|· = [xt−l+1|·, . . . ,xt|·], with “·” representing theidentified mode for a sample. Let x∗

t|· = [x∗t−l+1|·, . . . ,x

∗t|·] de-

note the expected normal replicas of xt|·. The two probabilisticgenerative models described in (23) and (33) can be rewritten asfollows:

xt|· = x∗t|· +ΔGw∗

t|· + εt|· (36)

xt|· = x∗t|· +R+ εt|· (37)

where w∗t|· = [w∗

t−l+1|·, . . . ,w∗t|·], R = [rt−l+1|·, . . . , rt|·], and

εt|· = [εt−l+1|·, . . . , εt|·]. Again, the rows of ΔG and the ele-ments of R admit multivariate and univariate Laplace priors,respectively. By including the latest and discarding the oldest

sample, faulty variables can be isolated online by identifyingthe nonzero rows of ΔG and R sequentially.

To estimate the model parameters, a variational Bayesian(VB) inference method is applied, which considers the completelog-likelihood of (36)

ln p(xt|· , x∗

t|· ,ΔG, w∗t|· , γ, ν, ak, zk, k = 1, . . . ,m

)

≡ −12

t∑

j=t−l+1

∥∥∥Δxj|· −ΔGw∗

j|·∥∥∥

2

2+

∥∥∥x∗

j|· −G·w∗j|·∥∥∥

2

2

+∥∥∥w∗

j|·∥∥∥

2

2+

t∑

j=t−l+1

m∑

k=1

−12‖Δgk‖2

2

/

zk

+ φ (zk, ak) + ϕ (γ, ν) (38)

where Δxj|· = xj|· − x∗j|·, x

∗j|· = x∗

j|· − μ− Fh·, φ(zk, ak) =(λ +DG/4− 3/4) ln ak − (zk/2+η)ak+(DG/2−1/2) ln zk,and ϕ(γ, ν) = (α− l/2 − 1) ln γ + (ρ− l/2 − 1) ln ν − βγ− κν. The posterior expectation of each latent variable can beobtained by taking the expectation of (38) on an approximatejoint posterior distribution as

p(x∗t|· , w

∗t|· ,ΔG, γ, ν, ak, zk, k = 1, . . . ,m

∣∣xt|·

)

≈ q (γ) q (ν)

t∏

j=t−l+1

q(x∗j|·)q(w∗

j|·)

×m∏

k=1

q (Δgk) q (ak) q (zk).

Using Bayesian rule, (39) updates the posterior expectation ofΔgT

k as

⟨ΔgT

k

⟩= Φk

t∑

j=t−l+1

⟨γ−1

⟩ ⟨Δxjk|·

⟩ ⟨w∗

j|·⟩

Φk = 〈zk〉⎡

⎣I+

t∑

j=t−l+1

⟨γ−1

⟩ 〈zk〉⟨w∗

j|·w∗Tj|·⟩⎤

−1

(39)

where⟨Δxjk|·

⟩is the kth element of

⟨Δxj|·

⟩= xj|· − 〈x∗

j|·〉.Similarly, the posterior expectations of x∗

j|· and w∗j|· can be

estimated by

⟨x∗j|·⟩=

[⟨γ−1

⟩ ⟨xj|·

⟩+⟨ν−1

⟩ (μ· +G·

⟨w∗

j|·⟩)]

(40)⟨w∗

j|·⟩= P·

[⟨γ−1

⟩ ⟨ΔGT

⟩ ⟨Δxj|·

⟩+⟨ν−1

⟩GT

·⟨x∗j|·⟩]

(41)⟨w∗

j|·w∗Tj|·⟩=⟨w∗

j|·⟩⟨

w∗Tj|·⟩+P· (42)

⟨x∗j|·x

∗Tj|·⟩=⟨x∗j|·⟩⟨

x∗Tj|·⟩+ ϑI (43)

Authorized licensed use limited to: Zhejiang University. Downloaded on September 15,2020 at 07:33:45 UTC from IEEE Xplore. Restrictions apply.

Page 6: A Unified Probabilistic Monitoring Framework for Multimode ......2020/11/03  · variate statistical process control (MSPC) [2]. The basic idea of MSPC methods is to obtain a set of

6296 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 16, NO. 10, OCTOBER 2020

where ϑ = 〈γ−1〉+ 〈ν−1〉, μ· = μ+ F〈h·〉, 〈xj|·〉 = xj|· −〈ΔG〉〈w∗

j|·〉, 〈x∗j|·〉 = 〈x∗

j|·〉 − μ·, and

⟨ΔGTΔG

⟩=

m∑

k=1

⟨ΔgT

k

⟩ 〈Δgk〉+Φk,

P· =(I+

⟨γ−1

⟩ ⟨ΔGTΔG

⟩+⟨ν−1

⟩GT

· G·)−1

.

Finally, the posterior expectations of scaled factors and otherparameters are obtained as follows:

〈zk〉 =√

〈ak〉/tr(⟨ΔgT

k

⟩ 〈Δgk〉+Φk

)(44)

〈ak〉 = (λ + (DG + 1)/2)/(η + 〈zk〉/2) (45)

⟨γ−1

⟩=

ml/2 + α

β +

⟨∥∥∥xt|· − x∗

t|· −ΔGw∗t|·∥∥∥

2

F

⟩ (46)

⟨ν−1

⟩=

ml/2 + ρ

κ+

⟨∥∥∥x∗

t|· − μ· − bt|·∥∥∥

2

F

⟩ (47)

where bt|· = [G·w∗t−l+1|·, . . . ,G·w∗

t|·], μ· = [μ·, . . . ,μ·], and⟨∥∥∥xt|· − x∗

t|· −ΔGw∗t|·∥∥∥

2

F

=∥∥xt|·

∥∥2F+

t∑

j=t−l+1

tr[⟨

x∗j|·x

∗Tj|·⟩]

+ tr[⟨ΔGTΔG

⟩ ⟨w∗

j|·w∗Tj|·⟩]

− 2[xTj|·(⟨

x∗j|·⟩+ 〈ΔG〉

⟨w∗

j|·⟩)

−⟨x∗Tj|·⟩〈ΔG〉

⟨w∗

j|·⟩]

,

⟨∥∥∥x∗

t|· − μ· − bt|·∥∥∥

2

F

= ‖μ·‖2F

+

t∑

j=t−l+1

tr[⟨

x∗j|·x

∗Tj|·⟩+G·

⟨w∗

j|·w∗Tj|·⟩GT

·]

− 2[⟨

x∗Tj|·⟩(

μ· +G·⟨w∗

j|·⟩)

− μT· G·

⟨w∗

j|·⟩]

.

In summary, the alternating updates of all listed posterior ex-pectations constitute the VB algorithm. At each update step, theposterior distribution of each variable is inferred conditioned onthe fixed distributions for the other variables. As for the estima-tion of R, a similar VB inference algorithm can be developed.

3) Fault Scoring: To facilitate the characterization of differ-ent fault levels, two fault scores for each variable are definedas

δk = ‖Δ 〈gk〉‖1 (48)

δk = ‖〈rk〉‖1 (49)

where rk is the kth row of R, ‖ · ‖1 denotes the l1-norm ofa vector, δk corresponds to fault score obtained from faultysamples violating the T 2 statistic, and δk corresponds to those

causing violations in the SPE. For faulty samples violating theT 2

statistics, a fault score vector δ = [δ1, . . . , δm]T is obtained via(48), which can be normalized by its maximum value. If faultysamples violate the SPE statistics, another fault score vector canbe obtained.

V. NUMERICAL SIMULATION EXAMPLE

This section illustrates the proposed process monitoring strat-egy based on I-PLDA. A process with six variables driven bytwo hidden variables is generated as follows:

x = Ωυ + e

Ω = [κ1,κ2]

κ1 = [0, 0, 0.9835, 0.8979, 0, 0.7482]T

κ2 = [0.8921, 0.5856, 0, 0, 0.9154, 0.0581]T . (50)

Here x = [x1, x2, x3, x4, x5, x6]T are process variables to be

monitored, 2 hidden variables are denoted asυ = [υ1, υ2]T . The

observation noise e = [e1, e2, e3, e4, e5, e6]T follows a Gaussian

distribution of e ∼ N (0, diag(0.01 × I)). To simulate a pro-cess with multiple operation modes, three different Gaussiandistributions are imposed on the hidden variables as

Mode 1: υ1 ∼ N (10, 0.82), υ2 ∼ N (12, 1.32);Mode 2: υ1 ∼ N ( 5 , 1.42), υ2 ∼ N (20, 1.52);Mode 3: υ1 ∼ N (16, 2.02), υ2 ∼ N (30, 2.52).For the purpose of model training, 400 samples are generated

from each mode and a total of 1200 training samples have beencollected and stored in X0 ∈ R1200×6. The first 400 samples aregenerated from Mode 1, followed by the 800 samples generatedfrom Mode 2 and Mode 3. In addition, a test dataset Xf1 ∈R1200×6 is generated from the three modes in a similar way anda sensor fault is introduced in the hidden variable for samplesfrom 101–400 as

Samples 101 − 400 : Xf1 = Ω(υ + [2, 0]T

)+ e

Based on the training data, Algorithm 1 is used to estimatethe model parameters of the I-PLDA model. For parameterssetting, the dimensions of loading matrices are set as DF = 2andDG = 6. For other parameters, according to Yang et al. [32],large α and ρ encourage small noise variances and the hyperpa-rameters setting includesα = ρ = 0.5 andβ = κ = 10−6. Also,Yang et al. [32] demonstrates that a larger λ results in a moresparsity-encouraging prior and the similar settings for the otherhyperparameters are λ = 0.5 and η = 10−6.

The process monitoring strategy proposed in Section IVis now tested on the faulty samples. For fault detection, themonitoring statistics T 2 and SPE of both I-PLDA/PLDA andPPCA are provided. For fault isolation, based on the analysis byLiu et al. [31], the interval of 5 ≤ l ≤ 20 for window length issuggested. Here it is set as l = 10 through experiments. For com-parison, the standard PLDA, reconstruction-based contribution(RBC) [33], and PPCA are considered.

For each of the test samples, the mode identification methodproposed in Section IV-A is applied and the results are shownin Fig. 1. The upper plot of Fig. 1 shows the cosine similarities,with the black line, red line, and blue line corresponding to those

Authorized licensed use limited to: Zhejiang University. Downloaded on September 15,2020 at 07:33:45 UTC from IEEE Xplore. Restrictions apply.

Page 7: A Unified Probabilistic Monitoring Framework for Multimode ......2020/11/03  · variate statistical process control (MSPC) [2]. The basic idea of MSPC methods is to obtain a set of

LIU et al.: UNIFIED PROBABILISTIC MONITORING FRAMEWORK FOR MULTIMODE PROCESSES BASED ON PLDA 6297

Fig. 1. Mode identification results based on I-PLDA for Xf1.

Fig. 2. Monitoring results using I-PLDA, PLDA, and PPCA for Xf1.

of Mode 1, Mode 2, and Mode 3, respectively. The lower plotshows the assigned mode for each sample. It can be seen fromthe upper plot of Fig. 1 that the maximal cosine similarity alwayscorresponds to the correct process mode. A clearer inspectionshows that the cosine similarity cos(h1,ht|1) fluctuated moresignificantly from the 101st to the 400th samples, this is dueto the introduction of the sensor bias in the hidden variable.Despite the fluctuations, faulty samples can still be correctlyassigned into Mode 1, which shows the good performance ofthe proposed method in mode assignment.

After the mode identification step, the monitoring resultsbased on I-PLDA, PLDA, and PPCA are presented in Fig. 2.The fault only affects the T 2 statistics and not the SPE statisticsof all methods. Hence only the SPE statistics of I-PLDA areshown, and those of PLDA and PPCA are not shown as theydo not produce significant alarms. As can be seen from Fig. 2,significant number of violations are observed in the T 2 statis-tics of all three models after the 101st sample, indicating theoccurrence of a process fault in Mode 1. A clearer inspection ofFig. 2, however, yields that higher sensitivity can be observedfor I-PLDA. This is verified by the fault detection rates (FDR) ofI-PLDA being 53.3%, comparing to those of 29.3% for PLDAand 43.3% for PPCA. This is expected, as the flexibility isenhanced by introducing a specific within-class loading matrixfor each mode.

After the fault is successfully detected, it is essential tolocalize the faulty variables. The moving window-based faultisolation strategy proposed in Section IV-C is now applied onthe 300 faulty samples and the results are shown in Fig. 3, wheredeeper color indicates greater fault score. This time, comparisonis made between I-PLDA, PLDA, PPCA, and RBC. In Fig. 3,

Fig. 3. Isolation results using I-PLDA, PLDA, PPCA, and RBC for Xf1.

Fig. 4. Aggregated fault scores using I-PLDA, PLDA, PPCA, and RBCfor Xf1.

the sample-by-sample fault scores obtained from I-PLDA arerecorded in the left-upper plot and those for PLDA, PPCA, andRBC are shown in the left-lower, right-upper, and right-lowerplots, respectively. Fig. 4 presents the aggregated fault scores forthe 101–400th samples. Comparing the plots in Fig. 3, one canobserve that I-PLDA/PLDA shows better isolation capability.This can be confirmed by the plots of Fig. 4 as the aggregatedfault scores for I-PLDA/PLDA clearly indicate that x3, x4, andx6 are faulty variables. In contrast, the aggregated contributionof PPCA and RBC also identifyx3,x4, andx6 as faulty, however,the contributions of x1, x2, and x5 cannot be neglected.

VI. APPLICATION TO INDUSTRIAL PROCESSES

This section demonstrates the performance of the proposedmethod using its applications to the Tennessee (TE) processand a blast furnace ironmaking process, in comparison withPLDA/PPCA. Comparisons with other methods showed similarresults and hence are not included.

A. TE Process Application

The TE process involves five units including an exothermicreactor, a stripper, a flash separator, a recycle compressor, anda condenser. In the TE process, 41 measured variables and12 manipulated variables are collected. In this article, a totalof 16 continuous variables are selected and listed in Table I.According to [6], there are six normal operation conditions,each corresponding to a different process mode. To obtain thetraining dataset, a normal dataset X0 consisting of 3000 datapoints are sampled every 3 min under the six modes, so that

Authorized licensed use limited to: Zhejiang University. Downloaded on September 15,2020 at 07:33:45 UTC from IEEE Xplore. Restrictions apply.

Page 8: A Unified Probabilistic Monitoring Framework for Multimode ......2020/11/03  · variate statistical process control (MSPC) [2]. The basic idea of MSPC methods is to obtain a set of

6298 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 16, NO. 10, OCTOBER 2020

TABLE IPROCESS VARIABLES AND DESCRIPTIONS

Fig. 5. Mode identification results based on I-PLDA for Xf .

Fig. 6. Monitoring results using I-PLDA and PPCA for Xf .

each mode involves 500 samples. In addition, a faulty datasetXf undergoing two random variations is introduced. For thesake of simplicity, the first three modes are included in Xf , witheach mode containing 1000 samples. For Mode 1, a Gaussianvariation with zero mean and variance of 10 is added to thereactor temperature from 201st time instance. For Mode 2,another Gaussian variation with zero mean and variances of 5are added to both the purge rate and stripper pressure at the1161st sample. The parameters of I-PLDA are set as DF = 4and DG = 20 via trial and error. The other parameters remainthe same as before. For PPCA, the number of retained PCs is setas 4.

The I-PLDA parameters are then determined using Algo-rithm 1 based onX0 and the results of mode identification forXf

are shown in Fig. 5. It can be seen from Fig. 5 that the samplesin Xf are successfully identified into the correct modes usingthe cosine similarity. As the mode assignments in the lower plotof Fig. 5 are exactly the same as the initial mode settings.

Next, the monitoring procedures proposed in Section IV-B areused to detect the faults and the results are shown in Fig. 6. It can

Fig. 7. Aggregated fault scores using I-PLDA and PPCA for Xf .

TABLE IIMONITORED BLAST FURNACE PROCESS VARIABLES

be seen from Fig. 6 that the fault of random variation in Mode 1is successfully detected by the SPE statistics of both I-PLDA andPPCA. The random variation in Mode 2, on the other hand, arealso successfully detected by both the T 2 and SPE statistics.For fault isolation, the procedures proposed in Section IV-Care used. This time, for simplicity, only the aggregated faultscores are presented in Fig. 7. The fault isolation results in Fig. 7demonstrate thatx7 to be the dominant faulty variables in Mode 1and x8, x12 for Mode 2. A clearer inspection, however, showsthat the contributions of other variables are not negligible forPPCA. This shows the better fault isolation performance of theproposed method.

B. Application to a Blast Furnace Ironmaking Process

A blast furnace is used to produce hot metal for steelmaking.During the operation of a blast furnace, iron ore and cokedropping from the top meet hot air and coal powder blowingfrom the bottom, resulting in a series of chemical reactions andgradually forming the product-liquid hot metal. Meanwhile, slagexits from the bottom and the flue gas escapes from the top.In this example, practical data collected from a blast furnacewith 2500 m3 in China is considered and a total of 10 processvariables related to gas flow are considered and listed in Table II.The dataset consists of 1500 samples covering three different op-eration conditions and an abnormal dataset with 1000 samples.This fault involves a fluctuation in the CO concentration in theflue gas after the 500th sample due to excessive blast of coalpowder.

The sampling period for these samples is 2 min. Based on thenormal dataset, the model parameters of I-PLDA are obtainedwith DF = 3 and DG = 10 via trial and error. The other param-eters are the same as the TE process, and the retained PCs forPPCA is set as 3. After model training, the mode identificationfor the faulty dataset is carried out. For PPCA, the modes of testsamples are identified using the maximum likelihood criterion.

Authorized licensed use limited to: Zhejiang University. Downloaded on September 15,2020 at 07:33:45 UTC from IEEE Xplore. Restrictions apply.

Page 9: A Unified Probabilistic Monitoring Framework for Multimode ......2020/11/03  · variate statistical process control (MSPC) [2]. The basic idea of MSPC methods is to obtain a set of

LIU et al.: UNIFIED PROBABILISTIC MONITORING FRAMEWORK FOR MULTIMODE PROCESSES BASED ON PLDA 6299

Fig. 8. Mode identification using I-PLDA, PLDA, and PPCA for blastfurnace data.

Fig. 9. Monitoring results of blast furnace fault using I-PLDA, PLDA,and PPCA.

The results of mode identification using I-PLDA/PLDA andPPCA are shown in Fig. 8, from which it can be seen that themode identification results of all the three methods are verysimilar, indicating the identifications are appropriate. A clearerinspection shows that the identified modes of the 500–1000thsamples shows higher fluctuation than the first 500 samples. Thisis expected, as the fault may affect the process characteristicsand hence the mode identification. With the mode identified,fault detection are performed and the corresponding results arepresented in Fig. 9.

From the monitoring results in Fig. 9, it can be seen thatthe fault is successfully detected by all three methods. The faultdetection rates of theT 2 and SPE statistics for I-PLDA are 69.2%and 100%. In contrast, the FDRs of the T 2 and SPE statisticsfor PLDA are 35.2% and 96.2% and those for PPCA are 84% and11.8%, respectively. This can be explained, as the introductionof separate within-class loading matrices enhances the faultdetection capability. After the fault is detected, fault isolationis performed. The sample-by-sample fault scores are shown inFig. 10 and the aggregated fault scores are shown in Fig. 11. It isclearly shown in Fig. 10 that I-PLDA identifies x4 and x9 as thedominant faulty variables in both T 2 and SPE statistics. This isin accordance with the later finding that excessive coal injectioncaused increased CO concentration in the flue gas after the 500thsample. In contrast, PLDA and PPCA also identify x4 and x9

as the dominant faulty variables. However, the contribution ofother variables cannot be neglected. This is further confirmed bythe aggregated fault scores shown in Fig. 11. The application to

Fig. 10. Isolation results using I-PLDA, PLDA, and PPCA for blastfurnace fault.

Fig. 11. Aggregated fault scores using I-PLDA, PLDA, and PPCA forblast furnace fault.

blast furnace data demonstrates the advantages of the proposedmethod in monitoring of multimode process over PLDA andPPCA.

VII. CONCLUSION

This article proposed a unified probabilistic monitoringframework for multimode processes based on probabilistic lin-ear discriminant analysis. To better handle large within-classvariance, an improved PLDA was developed by introducing aseparate within-class loading matrix for each mode. For modeidentification, the cosine similarity was applied and a faultdetection and isolation strategy was proposed. The fault iso-lation procedures involved solution of two SPGMs. Applicationstudies to simulation examples and industrial processes showedthe proposed method has better fault detection and isolationperformance than competitive methods. Future work can befocused on extending the developed framework to monitoringof nonlinear processes using kernel trick.

REFERENCES

[1] S. Yin, S. X. Ding, X. Xie, and H. Luo, “A review on basic data-drivenapproaches for industrial process monitoring,” IEEE Trans. Ind. Electron.,vol. 61, no. 11, pp. 6418–6428, Nov. 2014.

[2] Z. Ge, “Review on data-driven modeling and monitoring for plant-wideindustrial processes,” Chemometrics Intell. Lab. Syst., vol. 171, pp. 16–25,2017.

[3] M. Quinones-Grueiro, A. Prieto-Moreno, C. Verde, and O. Llanes-Santiago, “Data-driven monitoring of multimode continuous processes:a review,” Chemometrics Intell. Lab. Syst., vol. 189, pp. 56–71, 2019.

[4] C. Tong, A. Palazoglu, and X. Yan, “An adaptive multimode processmonitoring strategy based on mode clustering and mode unfolding,” J.Process Control, vol. 23, no. 10, pp. 1497–1507, 2013.

Authorized licensed use limited to: Zhejiang University. Downloaded on September 15,2020 at 07:33:45 UTC from IEEE Xplore. Restrictions apply.

Page 10: A Unified Probabilistic Monitoring Framework for Multimode ......2020/11/03  · variate statistical process control (MSPC) [2]. The basic idea of MSPC methods is to obtain a set of

6300 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 16, NO. 10, OCTOBER 2020

[5] R. Grbic, D. Sliskovic, and P. Kadlec, “Adaptive soft sensor for onlineprediction and process monitoring based on a mixture of Gaussian processmodels,” Computer Chem. Eng., vol. 58, pp. 84–97, 2013.

[6] S. Tan, F. Wang, J. Peng, Y. Chang, and S. Wang, “Multimode processmonitoring based on mode identification,” Ind. Eng. Chem. Res., vol. 51,no. 1, pp. 374–388, 2011.

[7] Y. Zhang, Y. Fan, and N. Yang, “Fault diagnosis of multimode processesbased on similarities,” IEEE Trans. Ind. Electron., vol. 63, pp. 2606–2614,Apr. 2016.

[8] Y. Zhang and S. Li, “Modeling and monitoring between-mode transition ofmultimodes processes,” IEEE Trans. Ind. Informat., vol. 9, no. 4, pp. 2248–2255, Nov. 2012.

[9] K. Zhang, K. Peng, and J. Dong, “A common and individual featureextraction-based multimode process monitoring method with applicationto the finishing mill process,” IEEE Trans. Ind. Informat., vol. 14, no. 11,pp. 4841–4850, Nov. 2018.

[10] X. Peng, Y. Tang, W. Du, and F. Qian, “Multimode process monitoringand fault detection: A sparse modeling and dictionary learning method,”IEEE Trans. Ind. Electron., vol. 64, no. 6, pp. 4866–4875, Jun. 2017.

[11] W. Shao, Z. Ge, Z. Song, and J. Wang, “Semi-supervised robust modelingof multimode industrial processes for quality variable prediction based onstudent’s t mixture model,” IEEE Trans. Ind. Informat., to be published,doi: 10.1109/TII.2019.2902560.

[12] X. Deng, X. Tian, S. Chen, and C. J. Harris, “Nonlinear process faultdiagnosis based on serial principal component analysis,” IEEE Trans.Neural Networks and Learn. Syst., vol. 29, no. 3, pp. 560–572, Mar. 2018.

[13] X. Deng and J. Deng, “Incipient fault detection for chemical processesusing two-dimensional weighted SLKPCA,” Ind. Eng. Chemistry Res.,vol. 58, no. 6, pp. 2280–2295, 2019.

[14] Z. Ge, “Process data analytics via probabilistic latent variable models: Atutorial review,” Ind. Eng. Chemistry Res., vol. 57, pp. 12 646–12 661,2018.

[15] N. Sammaknejad, B. Huang, W. Xiong, A. Fatehi, F. Xu, and A. Espejo,“Operating condition diagnosis based on HMM with adaptive transitionprobabilities in presence of missing observations,” AIChE J., vol. 61, no. 2,pp. 477–493, 2015.

[16] R. Raveendran, H. Kodamana, and H. Biao, “Process monitoring usinga generalized probabilistic linear variable model,” Automatica, vol. 96,pp. 73–83, 2018.

[17] J. Yu and S. J. Qin, “Multimode process monitoring with Bayesianinference-based finite Gaussian mixture models,” AIChE J., vol. 54, no. 7,pp. 1811–1829, 2008.

[18] Z. Ge and Z. Song, “Mixture Bayesian regularization method of PPCA formultimode process monitoring,” AIChE J., vol. 56, no. 11, pp. 2838–2849,2010.

[19] M. Fang, H. Kodamana, B. Huang, and N. Sammaknejad, “A novelapproach to process operating mode diagnosis using conditional randomfields in the presence of missing data,” Comput. Chem. Eng., vol. 111,no. 3, pp. 149–163, 2018.

[20] J. Yu, “A new fault diagnosis method of multimode processes usingBayesian inference based gaussian mixture contribution decomposition,”Eng. Appl. Artif. Intell., vol. 26, no. 1, pp. 456–466, 2013.

[21] H. Gharahbagheri, S. A. Imtiaz, and F. Khan, “Root cause diagnosis ofprocess fault using KPCA and Bayesian network,” Ind. Eng. ChemistryRes., vol. 56, pp. 2054–2070, 2017.

[22] B. Rashidi, D. S. Singh, and Q. Zhao, “Data-driven root-cause fault diag-nosis for multivariate non-linear processes,” Control Eng. Pract., vol. 70,pp. 134–147, 2018.

[23] T. Chen and Y. Sun, “Probabilistic contribution analysis for statisticalprocess monitoring: A missing variable approach,” Control Eng. Practice,vol. 17, pp. 469–477, 2009.

[24] A. Haghani, T. Jeinsch, S. X. Ding, P. Koschorrek, and B. Kolewe,“A probabilistic approach for data-driven fault isolation in multimodeprocesses,” IFAC Proc. Volumes, vol. 47, no. 3, pp. 8909–8914, 2014.

[25] S. J. Prince and J. H. Elder, “Probabilistic linear discriminant analysis forinferences about identity,” in Proc. 11th Int. Conf. Comput. Vision., 2007,pp. 1–8.

[26] K. Grauman and B. Leibe, Visual Object Recognition. San Rafael, CA,USA: Morgan & Claypool, 2011.

[27] T. K. Moon, “The expectation-maximization algorithm,” IEEE SignalProcess. Mag., vol. 13, pp. 47–60, Nov. 1996.

[28] C. M. Bishop, Pattern recognition and machine learning. New York, NY,USA: Springer, 2006.

[29] S. Prince, P. Li, Y. Fu, U. Mohammed, and J. Elder, “Probabilistic modelsfor inference about identity,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 34, no. 1, pp. 144–157, Jan. 2012.

[30] S. D. Babacan, S. Nakajima, and M. N. Do, “Bayesian group-sparsemodeling and variational inference,” IEEE Trans. Signal Process., vol. 62,no. 11, pp. 2906–2921, Jun. 2014.

[31] Y. Liu, J. Zeng, L. Xie, S. Luo, and H. Su, “Structured joint sparse principalcomponent analysis for fault detection and isolation,” IEEE Trans. Ind.Informat., vol. 15, no. 5, pp. 2721–2731, May 2018.

[32] L. Yang, J. Fang, H. Cheng, and H. Li, “Sparse bayesian dictionary learningwith a gaussian hierarchical model,” Signal Process., vol. 130, pp. 93–104,2017.

[33] C. F. Alcala and S. J. Qin, “Reconstruction-based contribution for processmonitoring,” Automatica, vol. 45, pp. 1593–1600, 2009.

Yi Liu received the B.S. degree in avia-tion weapon engineering from The First AirAcademy of People’s Liberation Army (PLA) Airforce, Beijing, China, in 2006, and the M.S. de-gree in military equipment from Air Force En-gineering University of PLA, Xi’an, China, in2011. He is currently working toward the Ph.D.degree in control science and engineering withthe College of Control Science and Engineering,Zhejiang University, China.

His research interests include industrial pro-cess monitoring and fault diagnosis, process data analysis and machinelearning.

Jiusun Zeng received the B.S. and Ph.D. de-grees in applied mathematics from ZhejiangUniversity, Hangzhou, China, in 2004 and 2009,respectively.

He served as a Postdoctoral Research Asso-ciate with the Institute of Cyber Systems andControl of Zhejiang University, from 2009 to2011. After that he became a Faculty Memberwith China Jiliang University, Hangzhou, China,and is currently a Professor in Instrumentationand Control with the College of Metrology and

Measurement Engineering, Hangzhou. His research interests focus onmultivariate statistical process monitoring and fault diagnosis and theirapplications in industrial processes.

Jie Bao received the B.Sc. and M.Sc. degreesin electrical engineering from Zhejiang Univer-sity, Hangzhou, China, in 1990 and 1993, re-spectively. In 1998, he received the Ph.D. de-gree in chemical engineering (process control)from The University of Queensland, Brisbane,QLD, Australia.

He spent one year at the University of Albertaas a Postdoctoral Fellow and then joined thefaculty at The University of New South Wales,Sydney, SW, Australia, in 1999. He is currently a

Full Professor with the School of Chemical Engineering, UNSW. His re-search interests include distributed control, robust control, fault detectionand fault-tolerant control, dissipativity-based process control, behavioralsystems and control of industrial processes including aluminium smelt-ing, mineral processing, membrane separation and flow batteries.

Dr. Bao is an Associate Editor for Journal of Process Control.

Lei Xie received a B.S. and Ph.D. degrees fromZhejiang University, Hangzhou, China, in 2000and 2005, respectively.

Between 2005 and 2006, he was a Postdoc-toral Researcher with the Berlin University ofTechnology, Berlin, Germany, and an AssistantProfessor of Control Theory and Engineering,between 2005 and 2008, with Zhejiang Univer-sity, where he is currently a Professor in Con-trol Science and Engineering with the Collegeof Control Science and Engineering. He has

authored or coauthored over 50 articles for internationally renownedjournals and conferences, three book chapters, and one research textin the area of applied multivariate statistics and modeling. His researchinterests focus on the interdisciplinary area of applied statistics andsystem control theory.

Authorized licensed use limited to: Zhejiang University. Downloaded on September 15,2020 at 07:33:45 UTC from IEEE Xplore. Restrictions apply.