LGSPP-Bayes for Fault Detection and Diagnosis Qin Liu, Chunmei Yu College of Information Engineering, Southwest University of Science and Technology, Mianyang, China 1229318267@qq.com; *2 yyycm70@hotmail.com
1
Abstract It has been proved that global and local structure are both important for process monitoring, but principal component analysis (PCA) and locality preserving projections (LPP) can not consider them simultaneously in the process of dimension reduction. This article proposes a novel method named local and global structure preserving projections with Bayes classification (LGSPP-Bayes). The original data is projected to low dimensional feature space and the data projected matrix from high dimension space to low dimension space is gotten. Bayesian classifier then is designed to detect and diagnose faults. Case studies on TEP illustrate the effectiveness of the proposed method. Keywords Principal Component Analysis (PCA); Locality Preserving Projections (LPP); Bayesian Classifier; Fault Detection; Fault Diagnosis
Introduction To ensure the industry safety and economic profit, timely detection and diagnosis of faults is more and more important to industrial process. The multivariate statistical process control has been widely researching and applying to on-line process monitoring, especially principal component analysis (PCA). PCA extracts principal component from the highly correlated process data to eliminate the data correlation. Although the principal component extracted by PCA retains most of the data variation, they can only capture the global structure of the process data. As opposed to PCA, locality preserving projections (LPP) can find the inner structures of the original high-dimension data [1]. [2] proposed a new method called local and global PCA (LGPCA) which takes the advantage of both PCA and LPP. This method projects the original data onto a low-dimension space which has the similar local structure with the original space, moreover, it also ensures maximum variance to retain the global information. The experiment result shows that the fault detection effect of this method is better than PCA and LPP. The classifier based on Bayesian has been successfully applied in fault detection and diagnosis [3, 4]. In this paper, a novel method called local and global structure preserving projections and Bayes (LGSPP-Bayes) is proposed. First, projecting the original data to low-dimension feature space using local and global structure preserving projections (LGSPP). Then we detect and diagnose faults using Bayesian classifier. Fault Detection Based on PCA and LPP Suppose X [ x1 , x2 ,..., xn ]T ∈ R n× m denote a data matrix, we seeks to find a transformation matrix A ∈ R m×l and the = projected points yi = AT xi . Fault Detection Based on PCA The basic idea of PCA is projecting the high-dimension space to a low-dimension space, in which the data variance is maximal. The objective function of PCA is as follows: n
n
J ( A= ) PCA max ∑ ( yi −= y ) 2 max ∑ AT ( xi − x)( xi − x)T A
A i 1= A i 1 =
= max AT CA A
(1)
Where y = (1 / n) ∑ in=1 yi , x = (1 / n) ∑ in=1 x= (1 / n) ∑ in=1 ( xi − x )( xi − x )T . i , C International Journal of Engineering Practical Research, Vol. 4 No. 1-April 2015 2326-5914/15/01 089-05 © 2015 DEStech Publications, Inc. doi: 10.12783/ijepr.2015.0401.18
89
90
Qin Liu, Chunmei Yu
Then T 2 statistics and SPE statistics are defines to detection faults. The low-dimension space after PCA has the same outer shape with the original space, for they have the same direction to ensure the data variance maximal [5, 6]. But PCA ignores the inner local structure of different samples. Fault Detection Based on LPP Compared to PCA, the locality preserving projections can preserve the local information of the original sample points. Suppose the transformation matrix is A, and we can get it as follows [7]: n
J ( A) LPP = min ∑ ( yi − y j ) 2Wij = min 2 ∑ yi Dii yi T − 2∑ yiWij y j T A
i =1
A
ij
ij
= min A X ( D − W ) X = A min A XLX = A min AT UA T
T
T
A
T
A
A
(2)
Where U = XLX T is called local matrix, D is a diagonal matrix, Dii = ∑ Wij , and L is called Laplacian matrix, W is called j
similarity matrix, whose element Wij defines the neighbourhood relationship between the sample points xi and x j , when xi is the nearest neighbour of x j , Wij = w( xi , x j ) , w( xi , x j ) can be measured by Gaussian kernel function[1]. The fault detection of LPP is similar with PCA by statistics [8]. Fault Detection and Identification Based on LGSPP-Bayes Local and Global Structure Preserving Projections In order to combine the merits of PCA and LPP simultaneously in the process of projection, we should minimize the following objection function [9]: J ( A) LPP AT UA = = min J ( A) min min T A A J ( A) A A CA PCA
(3)
The above problem can be converted to the generalized eigenvalue problem: (4)
UA = λ CA
If we only keep the first l vectors A = {a1 , a2 , , al } according to the sorted eigenvalues of equation (4), then we can convert the high-dimensional data x to the low-dimensional data y as y = AT x , the local and global information of original space is considered in the projected data. Fault Detection and Identification Based on Bayes Classifier Suppose the data are identified as following a normal distribution, S j and m j are respectively representing the training samples’ variance matrix and mean vector, n j is the number of observations, δ j is the data set of the jth fault ω j . The relationship between them is as follows [3, 4]: p ( x | ω j ) ~ N (m j , S j ),= mj
1 nj
,Sj ∑ x=
x∈δ j
1
T ∑ ( x − m j )( x − m j ) n j − 1 x∈δ
(5)
j
The conditional probability density function of observation x which is under the condition that the fault is ω j is as follows: = p( x | ω j )
1 (2π ) n / 2 (det( S j ))1/ 2
1 exp[ − ( x − m j )T S j −1 ( x − m j )] 2
(6)
The identified decision function is given by: = g j ( x ) Inp ( x | ω j ) + InP (ω j )
1 n 1 = − ( x − m j )T S j −1 ( x − m j ) − In 2π − In[det( S j )] + InP (ω j ) 2 2 2
(7)
LGSPP-Bayes for Fault Detection and Diagnosis
91
After projection, suppose the projected data after LGSPP is y = AT x , the mean vector and variance matrix after dimension reduction is as follows: = m fj
1 T T = ∑ A x A mj n j x∈δ
(8)
j
= S fj
1
T T = m fj )T AT S j A ∑ ( A x − m fj )( A x −
n j − 1 x∈δ
(9)
j
Then the Bayes decision function can be calculated by: 1 1 g fj ( x ) = − ( AT x − m fj )T S j −1 ( AT x − m fj ) − In[det( S fj )] + InP (ω j ) 2 2 1 1 T T −1 T = − ( x − m j ) A( A S j A) A ( x − m j ) − In[det( AT S j A)] + InP (ω j ) 2 2
(10)
Suppose training data set is composed by not only data under normal condition but fault data, g f 0 ( x) is the decision function when training samples are normal data,
g fi ( x )(i = 1, f ) is
the decision function when training
samples are the ith fault data. When g fi ( x) > g f 0 ( x) , a fault has been detected. Once a fault is detected, Bayes classifier can be used to identify which kind the fault is. Suppose there are C kinds of training samples, we can calculate the decision function values respectively according to C kinds of training samples. If max( g i ( x)) = g k ( x) , the fault category of test samples is the kth kind of fault. The procedure of the fault detection and identification based on LGSPP-Bayes: 1) Normalize the training samples to zero mean and unit variance; 2) Calculate the covariance C of training samples X; 3) Compute similarity matrix W, diagonal matrix D, and Laplacian matrix L; 4) Use eq.(4) to obtain eigenvectors and select the first eigenvectors A = a1 , a2 ,..., al ( l ≤ m ) based on the information quantified by eigenvalues. 5) Suppose A is the dimension reduction matrix, compute the mean vector m fj and variance matrix S fj after dimension reduction; 6) Compute Bayes function g i ( x) to detect and identify faults. Case Studies TEP developed by USA Eastman chemical company is used to evaluate the validity of process control monitoring method. The process has 52 variables and 21 kinds of programmed known faults in the process simulation of TEP. The flow sheet and detail simulation process of TEP are shown in [10]. The data of this paper is obtained using closed-loop control. Training data set has 500 samples, and the faults induce from sample 20. Testing data set is composed of 960 samples, and the faults induce from samples 160. Before modelling, we normalize all the training data and testing data. Here we choose principal component l=9, kNN number is 10. The 99% confidence limit is used in this study. We choose fault 20 and fault 21 in TEP to check the effect of the method proposed. The fault detection results of LGSPP-Bayes and LGSPP- T 2 in faults 20, 21 are presented in Figures 1-2. Figure 1 shows that for fault 20 there are 2 more false positive numbers of LGSPP-Bayes than LGSPP- T 2 , but the false negative numbers of LGSPP-Bayes are 13 less than LGSPP- T 2 . Figure 2 shows that for fault 21 LGSPP-Bayes can detect all the faults from samples 160, but there are only 261 faulty samples for LGSPP- T 2 . The fault detection rates of PCA- T 2 , LGSPP- T 2 , PCA-Bayes, LGSPP-Bayes are presented in table 1. For fault 20, the fault detection rates of LGSPP-Bayes is 33%, 9%, 38% higher than PCA- T 2 , LGSPP- T 2 , PCA-Bayes
92
Qin Liu, Chunmei Yu
respectively. For fault 21, the fault detection rates of LGSPP-Bayes is 53%, 56%, 49% higher than PCA- T 2 , LGSPP- T 2 , PCA-Bayes respectively. Especially for fault 21 the detection rate of LGSPP-Bayes is nearly 100%. 200
600
0
500
-200
400
-400 i
T2
g (x)
700
300
-600
200
-800
100
-1000
g0(x) g1(x)
0
0
100
200
300
400 500 600 采样次数
700
800
900
-1200
1000
0
100
200
300
(a) LGSPP- T 2
400 500 600 采样次数
700
800
900
1000
(b) LGSPP-Bayes
FIGURE 1. MONITORING RESULTS ON FAULT 20. 25
100
x 10
1
90 0
80 70
-1
g (x)
50
i
T2
60 -2
40 -3
30 20
-4
g0(x)
10
g1(x)
0
0
100
200
300
400 500 600 采样次数
700
800
900
-5
1000
0
100
200
300
(a) LGSPP- T 2
400 500 600 采样次数
700
800
900
1000
(b) LGSPP-Bayes
FIGURE 2. MONITORING RESULTS ON FAULT 21. 25
25
x 10
1
0
0
-2
-1
g (x)
-4
i
i
g (x)
2
-6
x 10
-2
-3
-8
g0(x)
-4
g0(x)
g1(x) -10
0
100
200
300
700
600 500 400 采样次数
800
900
1000
(a) Diagnosis figure for fault 20
g1(x) -5
0
100
200
300
600 500 400 采样次数
700
800
(b) Diagnosis figure for fault 21
FIGURE 3. IDENTIFICATION RESULTS BASED ON LGSPP-BAYES. TABLE 1. FAULT DETECTION RESULTS OF THE FOUR METHODS IN FAULTS 20, 21
Fault 20 Fault 21 Average
PCA- T 47 47 47
2
LGSPP- T 2 71 44 57.5
PCA-Bayes
LGSPP-Bayes
42 51 46.5
80 100 90
900
1000
LGSPP-Bayes for Fault Detection and Diagnosis
93
When a fault has been detected, we need to identify which kind this fault is. The identification results of LGSPP-Bayes for fault 20 and 21 are presented in Figure 3. g i ( x)(i = 0,1) represents the Bayes decision function of fault 20 and 21. Figure 3(a) shows that when the faults induce from samples 160, LGSPP-Bayescan easily identifies that the fault detected is fault 20, but in Figure 3(b) for fault 21 the identification results is not ideal, so it still needs to be improved. Conclusions In this paper, a new method, which is called LGSPP-Bayes, has been proposed and is used for fault detection and diagnosis. Different from PCA and LPP, this method takes both global and local information into account, so the low-dimensional feature space retains more information in original space. Moreover, different from traditional methods using statistics to detect faults, this method uses Bayes classifier in fault detection and diagnosis. According to the simulation results for fault 20 and 21, comparing to PCA- T 2 , LGSPP- T 2 , PCA-Bayes, the average detection rates of LGSPP-Bayes are 43%, 32.5%, 43.5% more than the other three methods. And when fault 20 happens, LGSPP-Bayes can quickly identify, but for fault 21, the identified results are not so good, so this method still need to be improved. Acknowledgments This paper was supported by Postgraduate Innovation Fund Project by Southwest University of Science and Technology (15ycx118). REFERENCES
[1]
He X F, Niyogi P. Locality Preserving Projections. In: Proceedings of the 17th Annual Conference on Neural Information Processing Systems. Cambridge, USA: the MIT Press, 2003. 1-8.
[2]
Yu J B, Local and global principal component analysis for process monitoring. Journal of Process Control, 2012, 22(7): 1358-1373.
[3]
Wu B, Yu C M, Li Q. The fault detection of process industry. Beijing: Science Press, 2011.
[4]
Yu C M. A novel feature selection method for process fault diagnosis. Applied Mechanics and Materials, vol247-249, 2013: 2045-2049.
[5]
Zhang M G, Ge Z Q, Song Z H, Fu R W, Global–local structure analysis model and its application for fault detection and identification. Industrial & Engineering Chemistry Research, 2011, 50 (11): 6387–6848.
[6]
Fu R W, Zhang M G, Song Z H, Ge Z Q, Global-Local Structure Analysis for Fault Detection, In: Proceedings of the 49th IEEE Conference on Decision and Control, Hilton Atlanta Hotel, Atlanta, GA, USA, 2010. 15-17.
[7]
He X F, Cai D, Min W L, Statistical and computational analysis of locality preserving projection. In: Proceedings of the 22nd International Conference on Machine Learning. Bonn, Germany: ACM, 2005. 281-288.
[8]
Zhang M G, Song Z H. A fault detection method based on DLPP for dynamic processes. Journal of Huazhong University of Science and Technology, 2009, 37(I): 62-65.
[9]
Zhang M G, Song Z H. LPMVP algorithm and its application to fault detection. Acta Automatica Sinica, 2009, 35(6): 766-772
[10] Chiang L H, Russell E L, Braatz R D. Fault detection and diagnosis in industrial systems. London: Spring-Verlag, 2001. Qin Liu was born in Shanxi, China in 1991. She received her B.Eng. degree in the field of automation from the Central South of University Forestry and Technology, Hunan, China in 2013, and is still a postgraduate student in the field of fault detection and diagnosis in the Southwest University of Science and Technology, Sichuan, China. ChunMei Yu is with Southwest University of Science and Technology, Mianyang, Sichuan, 621010, China (corresponding author, phone number: 13778082737; e-mail: yyycm@sohu.com ).