An Intelligent Process Fault Diagnosis System Based on Neural Networks and Andrews Plot

: This paper proposes a neural network-based process fault diagnosis system with Andrews plot for information pre-processing to enhance the performance of online process fault diagnosis. By using features extracted from Andrews plot as the inputs to a neural network, as a classiﬁer, the diagnosis speed and reliability are improved. A method for determining the important features in the Andrews function is proposed. The proposed fault diagnosis system is applied to a simulated continuous stirred tank reactor process and is compared with two conventional neural network-based fault diagnosis systems: scheme B where the monitored measurements are directly fed to a neural network after scaling and scheme C where the monitored measurements are converted to qualitative trend data before feeding to a neural network. Of all the considered faults, the proposed fault diagnosis system diagnosed the abrupt faults on average 5.45 s and 2.66 s earlier than schemes B and C, respectively and diagnosed the incipient faults on average 13.82 s and 5.09 s earlier than schemes B and C, respectively. The results reveal that Andrews plot method utilized in online process monitoring has a great potential in industrial process monitoring. and in physical properties and constant boundary pressures of all input and output are assumed. Perfect mixing in the reactor is assumed. Under these assumptions, a mechanistic model is developed from mass balance and energy balance and is used to simulate the process.


Introduction
The breakthroughs and advances in industrial technology have made modern industrial production processes more automatic and productive with complex operational functionalities. These improvements have enhanced the product quality and expanded the production scale during the past decades. The associated risk of potential failures of various components increases with the complexity and functionality of modern industrial processes. Any faults hidden in insignificance could potentially lead to colossal damages if they remain undetected. Faults are hard to be completely eliminated in chemical industry. Undetected faults in an industrial production process could be accompanied by large hidden risks, which will cause distressingly serious consequences and the subsequent indirect impacts. Degradation in product quality, as a quintessentially consequence, can ruin the brand reputation and corporate identity. Particularly to small-and-medium-sized enterprises, this perhaps can lead to capital chain rupture and disrupt the development of the enterprise. Environmental contamination is also a typical effect of chemical process accidents. Authoritativeness of local authorities may be challenged due to the duty of environmental conservation and enterprises may also risk heavy penalties. Disastrous consequences, such as casualties, usually bring inestimable lost, which should be completely avoided. Generally, with the advancement of industrial technology, the improvement of social environmental awareness and the increased demand for high-quality products, the importance of industrial process monitoring is becoming increasingly important. This paper aims to propose a novel fault diagnosis system to achieve a positive promotion in industrial process monitoring.
The various proposed fault detection and diagnosis approaches can broadly be classified into the following three main approaches: model-based approaches, knowledge-based In prior works, several variations to the functions of Andrews plot were proposed by some researchers such as the following functions suggested in [27,29,30], respectively: f x (t) = x 1 sin(t) + x 2 cos(t) + x 3 sin(2t) + · · · ; (4) f x (t) = x 1 sin(ω 1 t) + x 2 cos(ω 2 t) + x 3 sin(ω 3 t) + · · · ; (5) f x (t) = 1 √ 2 {x 1 + x 2 (sin(t) + cos(t)) + x 3 (sin(t) − cos(t)) +x 4 (sin(2t) + cos(2t)) + · · · }; (6) where t is in the range [−π, π] and the values of ω i are mutually irrational and scaled between 0.5 and 1. Utilizing the Andrews function to process information, the dimension of data can be changed to an appropriate dimension, by selecting a number of features in Andrews function, i.e., the number of t-values. In chemical process monitoring applications, multivariable measurements are converted into an information-containing feature matrix via an Andrews function with a certain number of t-values.

The Proposed Fault Diagnosis System
In this paper, the proposed fault diagnosis system (referred to as scheme A) includes two subsystems: data pre-processing subsystem and process state analysis subsystem. In the data pre-processing subsystem, Andrews function is used to pre-process the online measurements. The processed features are fed into process state analysis system, which is a neural network-based classifier. Figure 1 shows the framework of the proposed fault diagnosis system. Through the data pre-processing system, features are extracted from the online monitoring information. The dimension of the extracted features is given by the numbers of t-values in the Andrew function. The system uses the principal components instead of the original variables to eliminate the effects of variable ordering in the Andrew function. Then Andrews function given by Equation (1) with the selected t-values is used Processes 2021, 9, 1659 4 of 17 to obtain the information-containing feature dataset. Finally, the pre-processed features are used to train a neural network as a fault classifier. diagnosis system. Through the data pre-processing system, features are extracted from the online monitoring information. The dimension of the extracted features is given by the numbers of t-values in the Andrew function. The system uses the principal components instead of the original variables to eliminate the effects of variable ordering in the Andrew function. Then Andrews function given by Equation (1) with the selected t-values is used to obtain the information-containing feature dataset. Finally, the pre-processed features are used to train a neural network as a fault classifier. The reason of using principal components in place of the original variables is as follows. Previous studies have shown that the multi-variable information processed by Andrews curve is highly sensitive to the ordering of variables in the data set, which can result in uncertainties [31,32]. In order to subside the effect of variable ordering on the outputs of Andrews function, principal components of the original data set are used as the inputs to Andrews function instead of the original variables [31,32]. An example to illustrate the effect of variable ordering will be given in Section 3. Figure 2 shows the procedure of Andrews function processing of the original data. In the proposed diagnosis scheme, an a-dimensional process data set, = ( 1 , 2 , 3 , ⋯ , ), is first scaled to zero mean and unit variance and then principal component analysis (PCA) is applied to the scaled data. All the principal components are used as the inputs to Andrews function with n t-values. A method for selection of the t-values (features) will be discussed later in this section. After Andrews function pre-processing, the dimension of the data is changed to n, which is the number of t-values used. Figure 3 shows an example of Andrews plot using Equation (1) with 63 values of t uniformly distributed in the range [-π, π]. In Figure 3, the red solid curves represent the normal data, the blue dashed curves represent the data with fault No. 3 and the black dotted curves represent the data with fault No. 5. Figure 3 contains 50 samples from each class and each sample is shown as a curve calculated using Equation (1). Noted that in practice the appropriate number of t-values should properly selected. Here the 63 t-values are just used for clear visualization to observe the separations. It can be seen from Figure The reason of using principal components in place of the original variables is as follows. Previous studies have shown that the multi-variable information processed by Andrews curve is highly sensitive to the ordering of variables in the data set, which can result in uncertainties [31,32]. In order to subside the effect of variable ordering on the outputs of Andrews function, principal components of the original data set are used as the inputs to Andrews function instead of the original variables [31,32]. An example to illustrate the effect of variable ordering will be given in Section 3. Figure 2 shows the procedure of Andrews function processing of the original data. In the proposed diagnosis scheme, an a-dimensional process data set, X = (x 1 , x 2 , x 3 , · · · , x a ), is first scaled to zero mean and unit variance and then principal component analysis (PCA) is applied to the scaled data. All the principal components are used as the inputs to Andrews function with n t-values. A method for selection of the t-values (features) will be discussed later in this section. After Andrews function pre-processing, the dimension of the data is changed to n, which is the number of t-values used. Figure 3 shows an example of Andrews plot using Equation (1) with 63 values of t uniformly distributed in the range [−π, π]. In Figure 3, the red solid curves represent the normal data, the blue dashed curves represent the data with fault No. 3 and the black dotted curves represent the data with fault No. 5. Figure 3 contains 50 samples from each class and each sample is shown as a curve calculated using Equation (1). Noted that in practice the appropriate number of t-values should properly selected. Here the 63 t-values are just used for clear visualization to observe the separations. It can be seen from Figure 3 that the monitoring information pre-processed by Andrews function can have more separations between normal data and fault data in certain values of t. This can ease the classification task leading to improved fault diagnosis performance. 3 that the monitoring information pre-processed by Andrews function can have more separations between normal data and fault data in certain values of t. This can ease the classification task leading to improved fault diagnosis performance.  It can be seen from Figure 3 that some values of t give better separations among the classes than other values. A method for determining the t-values is proposed in this paper as follows. First, using the Andrews function to process the data with a relatively large number (e.g., 100) of t-values, which are uniformly distributed in the range [-π, π]. Note here 100 is considered as sufficiently large number of t-values. Second, averaging the resulting Andrew function values at each t-value for each category (i.e., normal, fault No. 1, fault No. 2, …). Third, calculate the minimal distance among these Andrew function values at each t-value. Finally, select the first a few t-values with large minimal distance values, which indicate good separations among the classes. 3 that the monitoring information pre-processed by Andrews function can have more separations between normal data and fault data in certain values of t. This can ease the classification task leading to improved fault diagnosis performance.  It can be seen from Figure 3 that some values of t give better separations among the classes than other values. A method for determining the t-values is proposed in this paper as follows. First, using the Andrews function to process the data with a relatively large number (e.g., 100) of t-values, which are uniformly distributed in the range [-π, π]. Note here 100 is considered as sufficiently large number of t-values. Second, averaging the resulting Andrew function values at each t-value for each category (i.e., normal, fault No. 1, fault No. 2, …). Third, calculate the minimal distance among these Andrew function values at each t-value. Finally, select the first a few t-values with large minimal distance values, which indicate good separations among the classes. It can be seen from Figure 3 that some values of t give better separations among the classes than other values. A method for determining the t-values is proposed in this paper as follows. First, using the Andrews function to process the data with a relatively large number (e.g., 100) of t-values, which are uniformly distributed in the range [−π, π]. Note here 100 is considered as sufficiently large number of t-values. Second, averaging the resulting Andrew function values at each t-value for each category (i.e., normal, fault No. 1, fault No. 2, . . . ). Third, calculate the minimal distance among these Andrew function values at each t-value. Finally, select the first a few t-values with large minimal distance values, which indicate good separations among the classes.

A Simulated Continuous Stirred Tank Reactor System
In order to demonstrate the advantages of the proposed fault diagnosis strategy, a simulated CSTR system, which is taken from [33], is used to demonstrate and compare diagnosis performance. Figure 4 shows the diagram of the CSTR system. An irreversible heterogeneous catalytic exothermic reaction takes place in the CSTR. The product concentration is indirectly maintained at a desired level by controlling temperature, residence Processes 2021, 9, 1659 6 of 17 time and mixing conditions in the CSTR. Part of the reactor outlet stream is recycled to the reactor through a heat exchanger to provide temperature control by manipulating the flow rate of the cold water fed to the heat exchanger via a cascade control system. The residence time is controlled through the reactor level controller and the mixing condition is controlled by maintaining the recycle flow rate. Constant physical properties and constant boundary pressures of all input and output streams are assumed. Perfect mixing in the reactor is assumed. Under these assumptions, a mechanistic model is developed from mass balance and energy balance and is used to simulate the process.

A Simulated Continuous Stirred Tank Reactor System
In order to demonstrate the advantages of the proposed fault diagnosis strategy, a simulated CSTR system, which is taken from [33], is used to demonstrate and compare diagnosis performance. Figure 4 shows the diagram of the CSTR system. An irreversible heterogeneous catalytic exothermic reaction takes place in the CSTR. The product concentration is indirectly maintained at a desired level by controlling temperature, residence time and mixing conditions in the CSTR. Part of the reactor outlet stream is recycled to the reactor through a heat exchanger to provide temperature control by manipulating the flow rate of the cold water fed to the heat exchanger via a cascade control system. The residence time is controlled through the reactor level controller and the mixing condition is controlled by maintaining the recycle flow rate. Constant physical properties and constant boundary pressures of all input and output streams are assumed. Perfect mixing in the reactor is assumed. Under these assumptions, a mechanistic model is developed from mass balance and energy balance and is used to simulate the process. The simulated process measurements are generated under the normal condition and various abnormal conditions with typical measurement noise. The simulated online measurements are composed of 10 online measured process variables and 3 controller outputs. The sampling time of controllers is 4 s. Table 1 shows the 11 considered faults which are typical faults in industrial processes [33]. Table 2 shows the measurement noise ranges. Blockage of pipe 10 or 11 or control valve 1 fails low 5 External feed-reactant temperature abnormal 6 Control valve 2 stuck high 7 Blockage of pipe 7, 8 or 9 or control valve 2 stuck low 8 Control valve 1 stuck high 9 Blockage of pipe 4, 5 or 6 or control valve 3 stuck low 10 Control valve 3 stuck high 11 External feed-reactant concentration too low The simulated process measurements are generated under the normal condition and various abnormal conditions with typical measurement noise. The simulated online measurements are composed of 10 online measured process variables and 3 controller outputs. The sampling time of controllers is 4 s. Table 1 shows the 11 considered faults which are typical faults in industrial processes [33]. Table 2 shows the measurement noise ranges. Blockage of pipe 10 or 11 or control valve 1 fails low 5 External feed-reactant temperature abnormal 6 Control valve 2 stuck high 7 Blockage of pipe 7, 8 or 9 or control valve 2 stuck low 8 Control valve 1 stuck high 9 Blockage of pipe 4, 5 or 6 or control valve 3 stuck low 10 Control valve 3 stuck high 11 External feed-reactant concentration too low An example is given here to illustrate how the feature extraction is affected by variable ordering. Figure 5 shows Andrews plots using data under Fault No. 7 with three different process variable ordering. The 13 online information sources, [x 1 , x 2 , . . . , x 13 ], represent the 10 measured variables in the simulated CSTR system, which are temperature of input reactant, temperature in reactor, tank level, flow rate of input reactant, flow rate of recycled

Measurements
Noise Range recycled reactant, flow rates of product, flow rates of cold-water entering the heat exchanger, concentration of product in the reactor, concentration of the reactant in the input stream, pressure of liquid leaving the pump and 3 controller outputs. Variable ordering No. 1 is [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 ], variable ordering No. 2 is [ 1 , 2 , 3 , 11 , 8 , 9 , 10 , 4 , 5 , 6 , 7 , 12 , 13 ] and variable ordering No. 3 is [ 1 , 2 , 12 , 13 , 9 , 10 , 11 , 3 , 4 , 5 , 6 , 7 , 8 ]. The left column of Figure 5 shows the results using the principal components instead of the scaled original monitoring information and the right column shows the results using the scaled original monitoring information. Note that the original monitoring information is scaled to zero mean and unit variance before applying PCA and Andrews function. The top plots (red curves) represent variable ordering No. 1, the middle plots (blue curves) represent variable ordering No. 2, and the bottom plots (black curves) represent variable ordering No. 3. The curves are produced using 100 t-values uniformly distributed in the range [−π, π] and 50 samples of the original data. From Figure 5, it can be seen that Andrews plots from the original measurements can give different results for different variable orderings. However, Andrews plots from the principal components can obviously eliminate the impacts of variable ordering. Note that the principal components are always arranged in the order of the data variance that they explained.
The left column of Figure 5 shows the results using the principal components instead of the scaled original monitoring information and the right column shows the results using the scaled original monitoring information. Note that the original monitoring information is scaled to zero mean and unit variance before applying PCA and Andrews function. The top plots (red curves) represent variable ordering No. 1, the middle plots (blue curves) represent variable ordering No. 2, and the bottom plots (black curves) represent variable ordering No. 3. The curves are produced using 100 t-values uniformly distributed in the range [−π, π] and 50 samples of the original data. From Figure 5, it can be seen that Andrews plots from the original measurements can give different results for different variable orderings. However, Andrews plots from the principal components can obviously eliminate the impacts of variable ordering. Note that the principal components are always arranged in the order of the data variance that they explained.

Baseline Fault Diagnosis Scheme in CSTR System
In this paper, a conventional neural network-based fault diagnosis system is developed as a comparative study. In this baseline diagnosis scheme (referred to as scheme B), the original measurements are scaled and then directly fed into a neural network. To generate the data used for neural network training and testing, normal operation data and faulty operation data for each fault in the form of abrupt fault are generated from simulation. Table 3 gives the variable values corresponding to the 11 faults. These values are utilized in the simulated CSTR system to generate the considered fault dataset for neural network training. Figure 6 shows the baseline scheme. The scaling equation in this case is given by the following: where X i is the actual value, X i,p is pre-processed values for the ith on-line measurement, X i,normal is the mean value of normal data and X std is the standard deviation of normal data.

Baseline Fault Diagnosis Scheme in CSTR System
In this paper, a conventional neural network-based fault diagnosis system is developed as a comparative study. In this baseline diagnosis scheme (referred to as scheme B), the original measurements are scaled and then directly fed into a neural network. To generate the data used for neural network training and testing, normal operation data and faulty operation data for each fault in the form of abrupt fault are generated from simulation. Table 3 gives the variable values corresponding to the 11 faults. These values are utilized in the simulated CSTR system to generate the considered fault dataset for neural network training. Figure 6 shows the baseline scheme. The scaling equation in this case is given by the following: where is the actual value, , is pre-processed values for the th on-line measurement, ̅ , is the mean value of normal data and is the standard deviation of normal data.  Concentration of the reactant in the input stream 50% In addition to this baseline scheme, another scheme (referred to as scheme C) is also used to compare with the proposed scheme. In scheme C, the monitored process information (xi) is converted into qualitative trend values (xi,p): increase (1), steady (0) and decrease (−1) as follows: In addition to this baseline scheme, another scheme (referred to as scheme C) is also used to compare with the proposed scheme. In scheme C, the monitored process information (x i ) is converted into qualitative trend values (x i,p ): increase (1), steady (0) and decrease (−1) as follows: where σ i is the standard deviation of the ith measurement under a normal operating condition.

Abrupt Faults and Incipient Faults in the CSTR System
In this paper, both abrupt faults and incipient faults are considered. An abrupt fault means that the process parameter related to the fault varies suddenly when the fault occurs. An abrupt fault is simulated as a step change in the associated process parameter. An incipient fault means that the fault magnitude is gradually growing. At the early stage of an incipient fault, its effect is hidden in trifle and then it will cumulate sufficient damages if it remains undetected. The growth of an incipient fault can be described by the following equation: where M n is the normal value of a process parameter, M f (t) represents the faulty value of the same process parameter at time t, γ is the fault developing speed of an incipient fault and t is the time from the initiation of the incipient fault. The above equation can be used to represent common incipient faults in industrial processes [33].
In this study, three groups of abrupt faults and three groups of incipient faults under different fault conditions were generated as unseen validation data for evaluating the developed fault diagnosis systems. Table 4 gives three groups of relative fault magnitudes (Mag. 1, Mag. 2 and Mag. 3) corresponding to the 11 abrupt faults. Table 5 gives three groups of fault developing speeds (γ 1 , γ 2 and γ 3 ) corresponding to the 11 incipient faults.

Fault Diagnosis System Development
The performance of the proposed diagnosis scheme A is compared with the conventional schemes B and C under abrupt faults and incipient faults. In the proposed scheme A, the selection of t values (the number of features) in Andrews function is important and can affect the final diagnosis performance. In practical application, since the infinite number of t-value in Andrews function and large number of variables in a chemical engineering process, the selection of appropriate t-values may be time consuming.
The method for determining the important features (t-values) presented in Section 2 is used here. Andrew function values for all the data at 100 t-values uniformly distributed in the range [−π, π] are calculated. The minimum distance between all the classes (normal, fault No. 1, . . . , fault No. 11) is calculated at each of these t-values. in the range [-π, π] are calculated. The minimum distance between all the classes (normal, fault No. 1, …, fault No. 11) is calculated at each of these t-values. Figure 7 shows the top 30 minimal distance values in descending order. It can be seen that the minimal distance values drop quickly after about the first 11 t-values. Therefore, the first 11 t-values are selected, i.e.,  The training and testing data were produced using simulation based on the mechanistic model of the process. Simulated process operational data under the normal operation and the 11 faults with the fault conditions given in Table 3 were generated. For each fault, 80 samples were collected when one or more of the process variables exist 3 times of their normal standard deviations. The normal operation data also contains 80 samples. For the 80 samples corresponding to each category, 50 samples were randomly selected as training data while the remaining 30 samples were used as the testing data. Thus, the training data set contains 600 samples and the testing data sets contains 360 samples. The neural networks were trained on the training data set and the testing data set was used for network structure determination and implementing the "early stopping" mechanism during network training [15]. In all the three schemes, the neural networks are single hidden layer feedforward networks with sigmoid function in the hidden layer and output layer. All networks have 11 output layer neurons corresponding to the 11 faults. The networks were trained using the backpropagation training method with the learning rate, momentum constant and maximum training steps selected as 0.01, 0.9 and 1000, respectively. The training objective is to minimize the mean squared errors of the network. During network training, a target output of 1 is assigned to the network output corresponding to the fault while the targets for other network outputs are 0. During diagnosis, the neural network outputs should be lower than 0.2 when the samples are under the normal operating condition. A diagnosis result is issued when the neural network output corresponding to the fault is higher than 0.8 while other network outputs are below 0.2. An advance warning is issued when the neural network output corresponding to the fault is higher The training and testing data were produced using simulation based on the mechanistic model of the process. Simulated process operational data under the normal operation and the 11 faults with the fault conditions given in Table 3 were generated. For each fault, 80 samples were collected when one or more of the process variables exist 3 times of their normal standard deviations. The normal operation data also contains 80 samples. For the 80 samples corresponding to each category, 50 samples were randomly selected as training data while the remaining 30 samples were used as the testing data. Thus, the training data set contains 600 samples and the testing data sets contains 360 samples. The neural networks were trained on the training data set and the testing data set was used for network structure determination and implementing the "early stopping" mechanism during network training [15]. In all the three schemes, the neural networks are single hidden layer feedforward networks with sigmoid function in the hidden layer and output layer. All networks have 11 output layer neurons corresponding to the 11 faults. The networks were trained using the backpropagation training method with the learning rate, momentum constant and maximum training steps selected as 0.01, 0.9 and 1000, respectively. The training objective is to minimize the mean squared errors of the network. During network training, a target output of 1 is assigned to the network output corresponding to the fault while the targets for other network outputs are 0. During diagnosis, the neural network outputs should be lower than 0.2 when the samples are under the normal operating condition. A diagnosis result is issued when the neural network output corresponding to the fault is higher than 0.8 while other network outputs are below 0.2. An advance warning is issued when the neural network output corresponding to the fault is higher than 0.4. Evaluation of fault diagnosis systems is mainly based on accuracy, robustness and speed of diagnosis. The diagnosis time is measured as the time between a fault being initiated and being successfully diagnosed. The advance warning time is defined as the time between the fault being initiated and a correct advance warning being issued.
The number of hidden neurons was determined through cross validation on the testing data. A number of neural networks with different numbers of hidden neurons were trained on the training data and tested on the testing data. The network gives the overall best performance on the testing data is considered as having the appropriate number of hidden neurons. Tables 6-8 give the accuracy of different numbers of hidden neurons in diagnosis schemes A, B and C on the testing data, respectively. In Tables 6-8, the best performance is marked with bold font. It can be seen that the best numbers of hidden neurons for schemes A, B and C are 15, 17 and 13, respectively. Table 9 summarizes the numbers of neurons in different layers of the neural networks.

Performance under Abrupt Faults
The proposed scheme A and the conventional schemes B and C are tested on three groups of abrupt faults given by Table 4 to demonstrate the superiorities of the proposed method. All of the abrupt faults were initiated at 40 s. Table 10 indicates that all diagnosis systems successfully diagnosed all the considered abrupt faults. In terms of diagnosis speed, the proposed diagnosis scheme A diagnosed the abrupt faults 5.45 s and 2.66 s earlier on average than schemes B and C, respectively. Note that as the sampling time is given, traditional metrics such as the fault detection rate (FDR) and the missed detection rate (MDR) can be easily obtained from the fault diagnosis times given in Table 10. Table 11 gives the FDR for abrupt faults. In all the considered faults here, there are no incorrect diagnosis cases and the undetected samples are during the early stages of the faults. Thus, for all the considered faults here, MDR can be simply obtained as 1-FDR. It can be seen from Table 11 that the proposed scheme A gives overall higher FDR than schemes B and C. Table 11 also indicates that the FDR is higher when the fault magnitude is higher as a fault with higher magnitude is generally easier to detect and diagnosis than the same fault with lower magnitude. Figures 8 and 9 show two sets of diagnosis outputs in abrupt faults as examples to show the robustness of the proposed scheme.  1  24  12  4  48  24  12  16  12  4  2  20  8  4  32  12  12  12  12  4  3  36  28  12  44  32  28  64  56  52  4  32  12  4  32  28  8  16  16  12  5  32  16  8  36  24  8  36  32  28  6  60  28  12  12  8  8  16  16  16  7  36  16  8  52  36  16  16  16  16  8  24  8  8  52  20  4  20  16  16  9  20  8  8  40  8  8  24  12  8  10  20  8  4  36  20  4  20  12  8  11  16  8  8  12  8  8  12  12  12  Average 16.73 22.18 19.39    Figure 8 shows the performance of schemes A and B in diagnosing abrupt fault No. 2, with the fault relative magnitude being 1.67%. In Figure 8 and Table 1, respectively. The upper dash-dotted lines indicate the diagnosis threshold (0.8) while the lower dash-dotted lines indicate the advance warning threshold (0.4). Figure 8a shows that scheme A successfully diagnosed the fault at 20 s after the fault being initiated. Figure 8b shows that the outputs from scheme B responded quickly, when the abnormal condition occurred but with fluctuations. Eventually the fault was diagnosed at 32 s after the fault being initiated. The output curve then still oscillates with a certain magnitude over the 0.8. It can be seen that the proposed scheme A diagnosed the fault 12 s earlier than scheme B and the output curve is steadier than scheme B. Figure 9 shows the performance of schemes A and B in diagnosing abrupt fault No. 7, with the fault relative magnitude being 27.86%. As in the previous case, the outputs in scheme A have isolated points exceeding the diagnosis threshold (0.8) before the oscillation stabilized. Figure 9a shows that scheme A successfully diagnosed the fault at 16 s after the fault being initiated. Figure 9b shows that scheme B successfully diagnosed the fault at 36 s after fault being initiated. The diagnosis time of scheme A for this fault is 20 s shorter than that of scheme B.

Performance under Incipient Faults
The proposed scheme A and the conventional schemes B and C are applied to three groups of incipient faults given by Table 5 to compare fault diagnosis results. All of the incipient faults were initiated at 0 s. Table 12 indicates that all diagnosis systems successfully diagnosed all the considered incipient faults. In terms of the diagnosis speed, the proposed diagnosis scheme A diagnosed the incipient faults 13.82 s and 5.09 s earlier on average than schemes B and C, respectively. Table 13 gives the FDR values for the considered incipient faults of the three schemes. In can be seen that the proposed scheme A gives overall higher FDR than schemes B and C. Table 13 also indicates that the FDR is higher when the fault developing speed is higher as a faster developing fault is generally easier to detect and diagnosis than a slower developing fault. As with the abrupt fault cases, there are no incorrect diagnosis cases and all the undetected samples are during the early stages of the faults when their magnitudes are low. Thus, MDR can be simply worked out as 1-FDR and are not shown here. The following figures show three sets of diagnosis outputs in incipient faults as examples to show the good performance of the proposed scheme.  52  40  92  64  48  96  52  44  2  80  48  44  88  56  44  108  44  48  3  104  48  40  152  68  52  136  64  60  4  76  48  40  84  76  44  72  44  56  5  116  64  44  152  72  60  132  92  76  6  132  100  80  156  136  88  104  96  56  7  144  96  60  160  100  64  144  104  80  8  88  76  32  96  80  40  76  68  40  9  88  44  44  136  72  48  68  60  44  10  96  68  44  124  80  40  80  68  52  11  128  88  52  144  96  44  148  88  68  Average 72.73 86.55 77.82   Figure 10 shows the performance of scheme A and scheme B in diagnosing incipient fault No. 3, with the fault developing speed as γ = −1.29 × 10 −4 s −1 . It can be seen from Figure 10a that, under scheme A, the network output corresponding to fault No. 3 raises to over 0.8 with some slight oscillations after a period fault developing, while all other network outputs remain close to 0. As shown in Figure 10b, under scheme B, the network output corresponding to fault No. 3 gradually raises accompanied with some large oscillations until across the diagnosis threshold, then it becomes steady, while all other network outputs remain close to 0. Hence, both schemes successful diagnosed fault No. 3 occurred in this process. Figure 10a shows that scheme A successfully diagnosed the fault at 104 s. Figure 10b shows that scheme B successfully diagnosed the fault at 152 s. For this particular incipient fault, scheme A diagnosed the fault 48 s earlier than scheme B, and the output curves of scheme A also stabilize earlier than those of scheme B.
Processes 2021, 9, x FOR PEER REVIEW 14 of 18 Figure 10 shows the performance of scheme A and scheme B in diagnosing incipient fault No. 3, with the fault developing speed as γ = −1.29 × 10 −4 (s −1 ). It can be seen from Figure 10a that, under scheme A, the network output corresponding to fault No. 3 raises to over 0.8 with some slight oscillations after a period fault developing, while all other network outputs remain close to 0. As shown in Figure 10b, under scheme B, the network output corresponding to fault No. 3 gradually raises accompanied with some large oscillations until across the diagnosis threshold, then it becomes steady, while all other network outputs remain close to 0. Hence, both schemes successful diagnosed fault No. 3 occurred in this process. Figure 10a shows that scheme A successfully diagnosed the fault at 104 s. Figure 10b shows that scheme B successfully diagnosed the fault at 152 s. For this particular incipient fault, scheme A diagnosed the fault 48 s earlier than scheme B, and the output curves of scheme A also stabilize earlier than those of scheme B.    Figure 11 shows the performance of schemes A and B in diagnosing incipient fault No. 5, with the fault developing speed being γ = 1.12 × 10 −4 s −1 . As shown in Figure 11a, after the period of fault progressing, the network output corresponding to fault No. 5 raises up and exceeds 0.8 rapidly, while all other network outputs remain close to 0. Hence, scheme A successfully diagnosed the fault at 116 s. As shown in Figure 11b, the network output corresponding to fault No. 5 gradually increases across the diagnosis threshold 0.8, while all other network outputs remain close to 0. Hence, scheme B successfully diagnosed the fault at 152 s. For this particular incipient fault, the proposed scheme A diagnosed the fault 36 s earlier than scheme B, and the output curves of scheme A also stabilize earlier than those of scheme B. Figure 11 shows the performance of schemes A and B in diagnosing incipient fault No. 5, with the fault developing speed being γ = 1.12 × 10 −4 (s −1 ). As shown in Figure  11a, after the period of fault progressing, the network output corresponding to fault No. 5 raises up and exceeds 0.8 rapidly, while all other network outputs remain close to 0. Hence, scheme A successfully diagnosed the fault at 116 s. As shown in Figure 11b, the network output corresponding to fault No. 5 gradually increases across the diagnosis threshold 0.8, while all other network outputs remain close to 0. Hence, scheme B successfully diagnosed the fault at 152 s. For this particular incipient fault, the proposed scheme A diagnosed the fault 36 s earlier than scheme B, and the output curves of scheme A also stabilize earlier than those of scheme B.
(a) (b) Figure 11. Diagnosis performance of scheme A (a) and scheme B (b) under incipient fault No. 5 with developing speed of 1.12 × 10 −4 (s −1 ). Figure 12 shows the performance of schemes A and B in diagnosing incipient fault No. 10, with the fault developing speed being γ = 6.71 × 10 −5 (s −1 ). It can be seen from Figure 12 that, after an initial period of fault progressing, the network output corresponding to fault No. 10 gradually increases close to 1 with some slight oscillations, while all other network outputs remain lower than 0.2. Figure 12a shows that scheme A successfully diagnosed the fault at 96 s. Figure 12b shows that scheme B successfully diagnosed  Figure 12 shows the performance of schemes A and B in diagnosing incipient fault No. 10, with the fault developing speed being γ = 6.71 × 10 −5 s −1 . It can be seen from Figure 12 that, after an initial period of fault progressing, the network output corresponding to fault No. 10 gradually increases close to 1 with some slight oscillations, while all other network outputs remain lower than 0.2. Figure 12a shows that scheme A successfully diagnosed the fault at 96 s. Figure 12b shows that scheme B successfully diagnosed the fault at 124 s. It can be seen that the output curve corresponding to fault No. 10 from scheme B is slower in reaching the diagnosis threshold when the abnormal condition occurred. In this case, the proposed scheme A diagnosed the fault 28 s earlier than scheme B.

Conclusions
This paper proposes an enhanced intelligent neural network based online process fault diagnosis system by integrating Andrews plot and neural network techniques. By using features extracted from Andrews plot as the inputs to a neural network, the diag-

Conclusions
This paper proposes an enhanced intelligent neural network based online process fault diagnosis system by integrating Andrews plot and neural network techniques. By using features extracted from Andrews plot as the inputs to a neural network, the diagnosis speed and reliability are improved. A method for determining the important features in Andrews function is also proposed. Applications to a simulated CSTR process show very encouraging results. It is shown that the proposed method can give good performance in terms of diagnosis speed and accuracy. In addition, the proposed data pre-processing method is highly effective in adjusting the high dimensional data to an appropriate size. As with other neural network-based fault diagnosis systems, one limitation of the proposed method is that it requires the availability of process data covering various faults. Coping with imbalanced data sets and unavailability of certain fault data deserves future investigation. Integrating Andrews plot with other machine learning methods such as support vector machines or random forests could be carried out in the future. Furthermore, the combined use of Andrew plot and other feature extraction approaches could be investigated in the future to reduce uncertainty. Applications to real-world systems could be considered in the future.