Abstract
The safety of chemical processes is of critical importance. However, traditional fault monitoring methods have insufficiently studied the monitoring accuracy of multi-channel data and have not adequately considered the impact of noise on industrial processes. To address this issue, this paper proposes a neural network-based model, DSCBAM-DenseNet, which integrates depthwise separable convolution and attention modules to fuse multi-channel data features and enhance the model’s noise resistance. We simulated a real environment by adding Gaussian noise with different signal-to-noise ratios to the Tennessee Eastman process dataset and trained the model using multi-channel data. The experimental results show that this model outperforms traditional models in both fault diagnosis accuracy and noise resistance. Further research on a compressor unit engineering instance validated the superiority of the model.
Keywords:
densely connected convolution network; deeply separable convolution; attention mechanism; Tennessee Eastman process MSC:
68T07
1. Introduction
In modern industrial production, as the level of automation and intelligence in process industry systems continues to advance and develop rapidly, safety risks are also increasing. The complexity, dynamic nature, and high-risk characteristics of process industry systems can lead to production interruptions, especially in chemical enterprises where toxic, flammable, and explosive materials are often involved. If production equipment fails, it can result in significant loss of life and property, as well as other potential hazards. Therefore, diagnosing and predicting equipment failures before they occur and ensuring the normal operation of equipment is crucial. This area of research has clearly become a focal point of interest in both industry and academia [,,].
In the field of fault diagnosis research, the methods can be broadly categorized into analytical-model-based methods, expert-knowledge-based methods, and data-driven methods []. Analytical-model-based methods become challenging to compute when dealing with high-dimensional complex data due to the increased complexity of modern process industry systems. Expert-knowledge-based methods include approaches such as artificial neural networks, expert systems, and fault trees []. These methods rely on experienced operators or experts as the core, establishing corresponding fault relationships based on the representation of fault states. However, these methods are relatively difficult to implement and thus have not been widely adopted. Data-driven methods can be divided into three categories: statistical-analysis-based methods, signal-processing-based methods, and artificial intelligence-based methods. Statistical-analysis-based methods characterize and utilize correlations between variables and are suitable for fault detection in high-dimensional systems. Common methods include Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Partial Least Squares (PLS) [,,,,,]. Signal-processing-based methods are traditional approaches for condition monitoring and fault diagnosis, using techniques like correlation functions, time-frequency spectra, and autoregressive moving averages to obtain and analyze signal features such as mean, kurtosis, amplitude, and phase to determine system status and fault modes. Common methods include Fast Fourier Transform (FFT), Wavelet Transform (WT), Hilbert–Huang Transform (HHT), and Spectral Analysis (SA) [,,,,]. Artificial intelligence-based methods directly utilize process data to establish monitoring models, extracting data features from historical and real-time data for fault monitoring. The main methods include Bayesian Networks (BNs), Support Vector Machine (SVM), and neural networks [,,]. Although there have been significant advancements in these fault diagnosis methods, for complex process industry systems, the extraction of fault features still requires domain-specific knowledge, and the recognition accuracy of these methods needs further improvement.
With the rapid development of artificial intelligence and machine learning, traditional fault diagnosis methods are gradually revealing their limitations when dealing with complex industrial systems and high-dimensional data. Deep learning-based methods have demonstrated significant advantages in the field of fault diagnosis. By constructing multi-layer neural networks, deep learning can automatically extract effective features from massive raw data and capture hidden patterns within the data, leading to accurate diagnosis. In chemical equipment fault diagnosis, the advancement of sensors and computing technologies has led to an explosion of data, and deep learning methods can analyze this data, automatically identify potential faults, improve fault recognition accuracy, and predict faults before they occur, thereby avoiding the impact of equipment failures []. Consequently, deep learning has found widespread application in fault diagnosis. Researchers are continuously exploring the use of more deep learning methods for fault diagnosis. Xie et al. [] proposed a Hierarchical Deep Neural Network (HDNN) for fault diagnosis in the Tennessee Eastman process (TEP), where a supervised deep neural network was trained to effectively classify faults into multiple groups. He et al. [] enhanced the model’s feature learning ability by introducing a residual learning framework and skip connections. Huang et al. [] proposed DenseNet, where each layer’s feature data is used as input for subsequent layers, effectively mitigating the vanishing gradient problem. Shi et al. [] developed a semi-supervised dense ladder network based on DenseNet and ladder networks, enhancing feature transfer and reuse, and validated its effectiveness in the TEP process. Additionally, current fault diagnosis research increasingly combines neural networks with attention mechanisms. Yao et al. [] addressed the issue of long sequence data input by introducing an attention mechanism (AM) into LSTM, improving the accuracy of fault monitoring and identification in complex industrial processes. Yuan et al. [] proposed an STA-ConvBiLSTM model, integrating Spatiotemporal Attention (STA) mechanisms with CNN and BiLSTM, significantly enhancing the accuracy and stability of furnace tube temperature prediction in delayed coking units by improving spatiotemporal feature extraction and weight allocation. The aforementioned fault monitoring models still present the following issues:
- In actual industrial processes, there are various types of monitoring variables. The models mentioned above primarily consider time series correlations while overlooking the data correlations between sensors. Sensor data contain fault feature information to varying degrees, reflecting the interaction between different components’ faults. To ensure accurate monitoring, it is crucial to capture the correlations between data from different sensors and emphasize information most relevant to system safety.
- The majority of the research applying these models to industrial process fault monitoring relies on simulation data, which does not accurately reflect the noise conditions present in actual production environments. This limitation leads to inadequate feature adaptation and noise resistance, which can negatively impact fault diagnosis accuracy.
To further enhance the performance of deep learning models in fault monitoring within process industry systems, this paper proposes a novel fault monitoring model, DSCBAM-DenseNet. This model builds upon the DenseNet architecture and uses multi-sensor data as network input while testing its robustness by adding Gaussian noise in a simulated industrial environment. The model’s structure is based on depthwise separable convolution, and the integration of attention mechanisms enhances the model’s noise resistance []. The attention mechanism is a significant advancement in deep learning, capable of evaluating the importance of feature information by calculating its similarity and enhancing the model’s focus on key features through weight distribution. DSCBAM-DenseNet adopts this mechanism and outputs fault monitoring results through global average pooling and classification layers.
The main contributions of this paper are summarized as follows:
- Introduction of depthwise separable convolution: By applying separable convolution to the model, it effectively simulates the interactions between different sensor data, significantly reducing the network’s size and improving computational efficiency.
- Construction of the CBAM module: The model adaptively recalibrates feature weights derived from the separable convolution layers. Through dynamic calibration of feature responses, the model can highlight effective information features and suppress irrelevant ones, thereby enhancing the network’s ability to identify critical monitoring information.
- Experimental validation: To verify the performance of the proposed model, experiments were conducted using the publicly available Tennessee Eastman process (TEP) dataset and actual data collected from a chemical plant. The experimental results demonstrate that the proposed model exhibits excellent monitoring and noise resistance performance.
2. Preliminary Knowledge
2.1. DenseNet
DenseNet (Densely Connected Convolutional Networks) is an improvement upon ResNet, proposed by Gao Huang et al. in 2017. ResNet enables the training of deeper CNN models to achieve higher accuracy. The core of the ResNet model is the establishment of a “shortcut connection” between earlier and later layers, which facilitates the backpropagation of gradients during the training process, thereby enabling the training of deeper CNN networks. However, as the number of layers in a CNN increases, the path from output to input becomes longer, leading to a problem where gradients are likely to vanish as they backpropagate through such a long path. DenseNet addresses this issue by ensuring that the network maintains strong gradients that do not disappear.
Its basic idea is consistent with ResNet, but it establishes a dense connection between all previous layers and the subsequent layer, from which it derives its name. Equation (1) shows the connection method for ResNet (on the left of Figure 1), and Equation (2) shows the connection method for DenseNet (on the right of Figure 1). Another main feature of DenseNet is the reuse of features through the connection of features across channels, which directly connects all layers while ensuring maximum information transmission between layers in the network. DenseNet employs a strong implicit monitoring mode, where each layer establishes a connection with the previous layers. Error signals can easily propagate to earlier layers, allowing them to receive direct feedback from the final classification layer. DenseNet has the following advantages:
Figure 1.
Differences in connections between ResNet (left) and DenseNet (right).
- It reduces gradient disappearance;
- It enhances feature transmission and more effective utilization of features;
- It reduces parameters and has some inhibition on overfitting.
DenseNet Network Architecture
- (1)
- DenseBlock
In the DenseNet architecture, as shown in Figure 2, the output of each layer is fed into all subsequent layers. The DenseNet architecture uses a concatenation approach to address potential deficiencies in ResNet, such as selective discarding of certain layers and information blocking. Dense connectivity blocks are a crucial component of the network, with each layer directly connected to every other layer. This dense connectivity helps alleviate the problem of vanishing gradients, efficiently utilizes features, and reduces the number of parameters, thereby accelerating network operation speed.
Figure 2.
DenseBlock structural diagram.
- (2)
- Improved composition function
As shown in Figure 2, the key difference between ResNet and DenseNet is that the DenseNet outputs are connected rather than simply added like ResNet. Therefore, after applying increasingly complex function sequences, a mapping from x to its expanded form is performed: the input of the N-th layer can affect the subsequent layers, and the output of the N-th layer is:
where represents a comprehensive effect of batch normalization layer, GeLU layer, and 1 × 3 convolution.
is the merging of outputs from Layer 0 to Layer N − 1 as channels.
- ①
- Batch normalize
It can accelerate the rate of convergence of the model. More importantly, it can alleviate the problem of “gradient vanishing” in deep networks to some extent, thus making it easier and more stable to train the deep network model []. The principle of the BN layer is as follows:
Input: Mini batch input
Output: Normalized network response
- Calculate the mean of batch processing:
- Calculate the data variance of batch processing:
- Normalize:
- Scale transformation and offset:
- ②
- GeLU active layer
GELU [] (Gaussian Error Linear Units) is an activation function based on the Gaussian error function, as shown in Figure 3, Compared with ReLU [] and other activation functions, GELU is smoother and helpful to improve the rate of convergence and performance of the training process.
Figure 3.
Comparison between ReLU and GeLU.
The following is the mathematical expression for the GELU activation layer:
where represents the cumulative dstribution function of the normal distribution:
where represents the Gaussian error function, which can be further expressed as:
GeLU performs better than ReLU in addressing the gradient vanishing problem in neural networks and can better support the training and optimization of deep neural networks. One issue with the ReLU function is that when the input is negative, the output remains constant at 0. This problem can lead to neuron inactivity, thereby undermining the model’s expressive power. The GeLU function is a continuous S-shaped curve, situated between the Sigmoid and ReLU functions, which can effectively alleviate this issue.
- ③
- A 1 × 3 Convolutional Layer
Suppose there are currently data with a length of 3, its padding being 1, stride 1, and a convolutional kernel size 1 × 3. Then, The working diagram of the convolutional kernel is shown in Figure 4:
Figure 4.
A 1 × 3 convolutional kernel diagram.
- (3)
- Transition layer
The transition layer is mainly used to connect two dense layers while reducing the size of the feature map. The transition layer comprises a bottleneck layer (i.e., 1 × 1 convolutional layer) and a 1 × 2 average pooling.
- ①
- Bottleneck layer
For a 1 × 1 convolution, as shown in Figure 5, it seems that each value in feature maps is multiplied by a scalar. However, as it passes through the activation layer, it actually conducts a non-linear mapping. Then, it can change the number of channels in the feature maps []. The 1 × 1 convolutions not only keep the size of the feature map unchanged and maintain the number of channels in the previous layer but also increase the depth of the network while enabling the network to learn more complex functions (feature information). The advantages of 1 × 1 convolution are:
Figure 5.
A 1 × 1 convolutional diagram.
- Increasing/reducing the dimensionality of the channel;
- Adding nonlinear features;
- Realizing the cross-channel information exchange and integration.
- ②
- A 1D average pooling layer
Due to the use of the dense connectivity architecture, it is not feasible to directly add pooling layers between each layer. Therefore, a combination of dense blocks is used to incorporate convolutional and pooling layers between each dense block. The core role of the pooling layer is to compress the data size and reduce dimensionality while preserving the main features of the original data without significant loss, and to enhance the sensitivity to differences between features []. The pooling layer is generally placed after the convolutional layer. The two commonly used pooling operations are max pooling and average pooling [].
2.2. Improved DenseNet Network Model
2.2.1. Depthwise Separable Convolution
The data collection process often involves multiple sensors, and multi-channel time series data are frequently used as input for fault monitoring models, where each channel represents a sensor sequence. For these multi-sensor time series input data, the data in each channel exhibit time correlation because they are collected by the same sensor and reflect the state of a monitored component over time. Meanwhile, due to the propagation and interaction of faults between different components, data in different channels also display cross-channel correlation. However, most existing fault monitoring models are based on data from the same sensor, which prevents them from effectively capturing and modeling the dependency relationships between different sensor data. This limitation, in turn, affects the accuracy and applicability of fault monitoring models. Therefore, to integrate information from different sensors, this study introduces separable convolution operations into fault monitoring to replace standard convolution operations, with the aim of effectively modeling the correlation between different sensor data by separating time correlation from cross-channel correlation.
Separable convolution decomposes time correlation and cross-channel correlation by breaking down standard convolutions []. As shown in Figure 6, standard convolution is decomposed into two parts: channel convolution and pointwise convolution. First, channel convolution applies a single convolutional kernel to each input channel to map the time series correlation of each sensor separately. Then, pointwise 1 × 1 convolution is performed to create a linear combination of the convolutional outputs channel by channel, mapping the cross-channel correlations of different sensors. Through these two independent steps, time correlation and cross-channel correlation can be fully decoupled. Specifically, separable convolution uses a specific convolution kernel, namely the pointwise convolution kernel, to map cross-channel correlation, thereby effectively capturing the correlation between different sensor data. Additionally, the introduction of separable convolutions greatly reduces the size of the network.
Figure 6.
Channel-wise convolution and point-wise convolution.
2.2.2. CBAM Attention Mechanism
The Convolutional Block Attention Module (CBAM) is a lightweight convolutional attention module, as shown in Figure 7, which combines two sub-modules: the Channel Attention Module (CAM) and the Spatial Attention Module (SAM). The CAM focuses on channel-wise attention, while SAM focuses on spatial attention. Together, they enhance the model’s capability to attend to both channel and spatial information.
Figure 7.
CBAM structure diagram.
Among them, CAM is responsible for maintaining the channel unchanged while compressing dimensions, focusing on the meaningful parts of the data []. SAM, on the other hand, maintains the dimensions unchanged while compressing the number of channels, concentrating on target location information []. The CBAM module enables the model to simultaneously perform both channel calibration and spatial calibration. It can not only strengthen or weaken features on specific channels but also enhance or suppress features at specific locations. It adaptively recalibrates feature information from separable convolutional layers, highlighting effective feature mappings, inhibiting useless feature mappings, and improving the information discrimination ability and noise resistance of the fault monitoring network.
2.2.3. 1D Global Average Pooling
Traditional DenseNet uses two linear layers for classification, which consumes a large amount of computing resources, increases model complexity, and is prone to overfitting []. In this study, we employed one-dimensional global average pooling (1D-GAP) before Softmax for classification. The 1D-GAP layer is a specially designed pooling architecture with dimension adaptation, aimed at reducing the number of model parameters and improving the operational speed of the algorithm. It also adaptively matches the dimension transformation between the last DenseBlock layer and the Softmax layer. In this study, we designed a 1D-GAP layer to replace the two fully connected (FC) layers in the traditional model, as shown in Figure 8.
Figure 8.
The 1D-GAP process flow.
An average value was calculated for all pixels in the feature map of each output channel, and after global average pooling, a feature vector was obtained for dimension == number of categories. Then, it was directly input into the Softmax layer for classification.
The advantages of 1D-GAP are:
- Inhibiting overfitting. Although the FC layer can retain all features, excessive parameters can lead to overfitting. Relatively, GAP has fewer parameters;
- More flexible input size. The feature map parameters processed by GAP are no longer related to the input data size.
2.2.4. Softmax Softmax Classification Layer
The output obtained through the processing of the GAP layer is an N-dimensional array that contains inconsistent values and does not conform to the probability distribution. In order to better train the model, a Softmax classifier is placed after the GAP layer for model accuracy evaluation, to facilitate error backpropagation. The schematic diagram of Softmax is shown in Figure 9.
Figure 9.
Softmax calculation process.
Suppose [z1, z2, z3] = [2, 1, –2], in the Softmax transformation [ez1, ez2, ez3] = [7.3, 2.71, 0.13], and the probabilities value of the three inputs ultimately obtained are [0.712, 0.27, 0.01], where they add up to 1. The objective of neural network model optimization is defined by the loss function. In fault classification problems, the commonly used method is the cross entropy [], whose calculated value represents the distance between two probability distributions. The mathematical expression of the cross entropy loss function is:
where P is the expected output, Q is the actual output, and i represents the i-th training sample. The training process actually involves continuously adjusting the parameters of to minimize their values, indicating that the probability obtained by means of training is becoming closer to the true probability. As a result, the accuracy of prediction is increased.
The overall structure diagram of the improved DSCBAM-DenseNet is shown in Figure 10.
Figure 10.
Improved DSCBAM-DenseNet structure diagram.
3. Experimentation
3.1. Experiment 1: Tennessee Eastman Process Experiment
In this section, MATLAB /Simulink version 6.2 or higher is used to simulate the Tennessee Eastman process (TEP). Based on the simulated data, experiments are conducted to compare the proposed method with other fault diagnosis methods.
3.1.1. Introduction to the Experiment
The Tennessee Eastman process (TEP) [] was established by the Eastman Chemical Company, Kingsport, TN, USA, to provide a practical industrial process for evaluating monitoring processes, process control, and other related areas. The TEP test process simulates a real industrial process and generates data that exhibits time-varying, strongly coupled, and nonlinear characteristics. This process is widely used as a significant data source for chemical fault detection. The TEP consists of five instruments and eight components, and its specific process flow is illustrated in Figure 11.
Figure 11.
Tennessee Eastman process.
There are a total of 21 faults in the simulation, as shown in Table 1, which are divided into six categories: step fault, random variable fault, slow offset fault, valve sticking fault, unknown fault, and valve position constant fault. Among them, Faults 1–7 are related to step changes in process variables; Faults 8–12 are related to increased variability of process variables; Fault 13 involves a slow offset in reaction kinetics; Faults 14, 15, and 21 are related to valve sticking; and the remaining faults are unknown types. Fault 21 is related to a constant valve position and, therefore, is not considered in the analysis. The sampling interval of the simulation dataset was 3 min. Normal operating conditions and 20 fault operating conditions were simulated for 63 h, resulting in a total of 26,460 pieces of data recorded.
Table 1.
Table of 21 faults for the TE process.
3.1.2. Selection of Experimental Data
The entire TE dataset consisted of a training set and a testing set. The TE centralized data were made up of data involved in 22 different simulation runs. The TE process had 53 variables, including 41 measured variables (XMEAS) as shown in Table 2, and 12 manipulated variables (XMV) as shown in Table 3. As XMV12 was the constant stirring speed as shown in Figure 12, 52 variables were selected for this experiment. The monitoring data recorded at a certain time can be represented as in Equation (14).
Table 2.
Process measurement.
Table 3.
Twenty-one control variables.
Figure 12.
Control variables in Fault 18 data.
From the above table content, it can be seen that the TE dataset contains a variety of data types: flow rate, temperature, pressure, power, liquid level, etc. A change in one variable can affect the recording of other variables. The cross-channel correlation is present between multiple sensors, and an analysis of certain individual variables cannot identify its fault type.
3.1.3. The DSCBAM-DenseNet Monitoring Model Established
For the two problems raised in the introduction, the fault monitoring method proposed in this study fully considers the relationship between multiple variables, and the proposed DSCBAM DenseNet directly uses monitoring data from different sensors as input. The fault monitoring process based on DSCBAM-DenseNet is shown in Figure 13, and the steps of the process industry system fault monitoring model are as follows:
Figure 13.
Fault monitoring process based on DSCBAM-DenseNet.
- Step 1:
- Obtain operating data for 21 operating conditions of the TE process through the MATLAB simulation program;
- Step 2:
- Perform one-hot encoding on all data and tag them with corresponding types;
- Step 3:
- Design the architecture of the DSCBAM DenseNet model and conduct the code implementation;
- Step 4:
- Preprocess the historical data, and first perform normalization processing to eliminate the adverse effects of singular samples; next, random seeds are added to randomly scramble the data and corresponding labels to avoid overfitting of the model; Finally, divide the validation set of the training set by 8:2;
- Step 5:
- Use historical data for model training and validation. If the accuracy of the test set reaches the predetermined standard, proceed to the next step online monitoring. Otherwise, modify the model parameters and retrain the model.
3.1.4. Experimental Results and Comparative Analysis
To verify the effectiveness of the proposed DSCBAM-DenseNet, the method is compared with DSCBAM-ResNet, DenseNet, ResNet, KNN, Naive Bayes (NB), SVM, and MLP. The model parameters used in this study are shown in Table 4, and the location of the CBAM module is illustrated in Figure 1. In this study, the batch size was set to 36, the number of training epochs was 50, the SGD optimizer was used, the learning rate was 0.1, and the momentum was 0.9. To account for variations due to different initialization parameters, 10 parallel experiments were conducted.
Table 4.
DSCBAM-DenseNet detailed network parameters.
Table 5 records the identification accuracy and standard deviation for the 21 types of faults across the 10 parallel experiments. It is evident from the table that this method performed well on the public dataset. Thanks to the DSC characteristics, which fully integrate the time series correlation of each sensor and the cross-channel correlation of different sensors, the fault monitoring model exhibited enhanced classification performance. The average accuracy for the 21 classifications reached 93.65%, with an average standard deviation of only 0.48%, indicating that the method proposed in this study is both accurate and stable.
Table 5.
Parallel test set results.
Table 6 shows the classification performance of different models on the test set. The method proposed in this paper has an accuracy of 93.62%, higher than traditional machine learning algorithms. Compared with ResNet, the predecessor of DenseNet, the proposed model proposed in this study enjoys advantages.
Table 6.
Accuracy of different models on test sets.
3.1.5. Supplementary Experiments
Since the TEP dataset consists of simulated data, it is idealized and does not replicate the complex environmental noise present in actual industrial processes. To address the issue mentioned in the introduction, Gaussian noise with different signal-to-noise ratios (SNR) is added to the TEP dataset to simulate various industrial environments. Gaussian noise with SNRs of −5, −1, 0, 1, and 5 is used to simulate different operating conditions. The network parameters are set as described in Section 3.1.4, and ten parallel experiments are conducted to obtain the average results. Figure 14 shows the results for the XMV(11) variable in Fault 18 after being contaminated with Gaussian noise at different SNR levels. As the SNR decreases, the original data are increasingly polluted.
Figure 14.
XMV (11) in Fault 18 under different SNR conditions.
With the reduction in SNR, the oscillation amplitude of the original data increased, and the original features were increasingly obscured by noise. The results, shown in the following table from multiple variable operating condition experiments, demonstrate that the proposed method model has a strong ability to extract abstract features. Even in extreme cases with an SNR of 1, the accuracy still reached 87.41%, representing only a 6.21% decrease in performance compared to the ideal state, as shown in Table 7. Therefore, the proposed method has good noise immunity. To visualize the effect of fault classification on the test set, t-distributed stochastic neighbor embedding (t-SNE) was used to map the features of the network output layer to a two-dimensional plane, with the results shown in Figure 15. Since the classification accuracy is only 87.41%, some point distributions are not concentrated enough. Figure 16 shows the classification confusion matrix of the DSCBAM-DenseNet model for TE data at an SNR of -5, and Table 8 presents the three evaluation metrics for the DSCBAM-DenseNet model: accuracy, recall, and F1 score.
Table 7.
Performance of DSCBAM-DenseNet under different SNR conditions.
Figure 15.
T-SNE visualization.
Figure 16.
DSCBAM-DenseNet classification results when SNR = −5.
Table 8.
The precision, recall, and F1 score (%) of the DSCBAM-DenseNet model.
3.2. Experiment 2: Compressor Unit Data
3.2.1. Introduction to the Experiment
To further validate the effectiveness of the algorithm proposed in this study, experimental data from a compressor unit in a chemical process was collected for further research. The schematic diagram of the compressor unit is shown in Figure 17. This CO2 compressor pressurizes the carbon dioxide gas from the decarbonization unit in the ammonia synthesis workshop to 7.108 MPa and sends it to the stripping tower. Due to long-term full-load operation, scaling in the circulation section increases the pressure difference, leading to severe load fluctuations and sudden steam temperature rise, which in turn causes faults.
Figure 17.
Schematic diagram of the compressor unit.
A total of 1223 sensors were used for data collection, including measurements of parameters such as rotational speed, gas flow, vacuum degree, liquid level, pressure, and temperature, as well as correlation data between various mechanical components, pipelines, compressors, and other equipment. Clearly, the data from the experiment exhibit high dimensionality and strong coupling characteristics. Given that the data collected under actual operating conditions include complex environmental noise and anomalies recorded during manual machine operation, the noisy compressor unit data shown in Figure 18 is also suitable for evaluating the model’s noise resistance.
Figure 18.
Compressor unit data containing noise.
3.2.2. Experimental Model and Results
The experimental record data were analyzed and 78 representative process variables were selected from all channels for multi-channel fusion fault monitoring. The experimental data length was 2880, and the training set and validation set were divided into 8:2 for the model training. The network parameters used in Experiment 2 are identical to those in Experiment 1. Due to the random initial parameters, 10 parallel experiments were conducted to calculate the accuracy and SD to verify the effectiveness of the model.
Historical data were used to train the network. The training process of DSCBAM-DenseNet is shown in Figure 19. The performance of the trained model on the test set is presented in Table 9. According to the experimental results, the average accuracy of ten parallel experiments reached 99.81%, with a standard deviation (SD) of 0.32%. Despite the high dimensionality and multiple types of input data, as well as severe environmental noise, this model still exhibited excellent performance.
Figure 19.
DSCBAM-DenseNet training process.
Table 9.
Parallel experiment results.
4. Conclusions
This study primarily addressed and resolved two issues: (1) existing models do not explicitly consider the connections between multi-channel data; (2) diagnostic models in industrial processes often overlook the impact of noise. We proposed a hybrid DSCBAM-DenseNet model that fully leverages the processing capabilities of DSC for multi-channel data input and performs well in extracting data features in noisy environments. Unlike existing methods, our approach tested the model’s robustness by adding Gaussian noise with different signal-to-noise ratios to the dataset to simulate real-world conditions. The model also used multi-channel data as training samples and integrated attention mechanisms to enhance noise resistance.
Author Contributions
Conceptualization, J.Y. and Z.L.; methodology, J.Y. and K.W.; software, K.W. and Z.L.; validation, C.H. and Y.Z.; formal analysis, L.L. and W.W.; writing—original draft preparation, K.W.; writing—review and editing, Z.L. and T.G.; funding acquisition, J.Y. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported in part by the National Natural Science Foundation of China (52065065) and Xinjiang Uygur Autonomous Region Key Research and Development Program Project (2023B01027-2).
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Xia, M.; Li, T.; Xu, L.; Liu, L.; De Silva, C.W. Fault diagnosis for rotating machinery using multiple sensors and convolutional neural networks. IEEE/ASME Trans. Mechatron. 2017, 23, 101–110. [Google Scholar] [CrossRef]
- Gong, W.; He, L.; Zhang, Z.; Zhou, X.; Cui, S. A novel deep learning method for intelligent fault diagnosis of rotating machinery based on improved CNN-SVM and multichannel data fusion. Sensors 2019, 19, 1693. [Google Scholar] [CrossRef] [PubMed]
- Zhou, D.H.; Liu, Y.; He, X. Review on fault diagnosis techniques for closed-loop systems. Acta Autom. Sin. 2013, 39, 1933–1943. [Google Scholar] [CrossRef]
- Lei, Y.; Yang, B.; Jiang, X.; Jia, F.; Li, N.; Nandi, A.K. Applications of machine learning to machine fault diagnosis: A review and roadmap. Mech. Syst. Signal Process. 2020, 138, 106587. [Google Scholar] [CrossRef]
- Rad, M.A.A.; Yazdanpanah, M.J. Designing supervised local neural network classifiers based on EM clustering for fault diagnosis of Tennessee Eastman process. Chemom. Intell. Lab. Syst. 2015, 146, 149–157. [Google Scholar]
- Wise, B.M.; Gallagher, N.B.; Butler, S.W.; White, D.D.; Barna, G.G. Principal components analysis for monitoring the west valley liquid fed ceramic melter. Waste Manag. 1988, 88, 811–818. [Google Scholar]
- Ge, Z.; Yang, C.; Song, Z. Improved kernel PCA-based monitoring approach for nonlinear processes. Chem. Eng. Sci. 2009, 64, 2245–2255. [Google Scholar] [CrossRef]
- Kano, M.; Hasebe, S.; Hashimoto, I.; Ohno, H.; Egawa, Y.; Shigemoto, T.; Hirochi, T. Monitoring independent components for fault detection. AIChE J. 2003, 49, 969–976. [Google Scholar] [CrossRef]
- Ge, Z.; Song, Z.; Gao, F.; Song, Z. Local ICA for multivariate statistical fault diagnosis in systems with unknown signal and error distributions. AIChE J. 2012, 58, 2357–2372. [Google Scholar] [CrossRef]
- Kresta, J.V.; Macgregor, J.F.; Marlin, T.E. Multivariate statistical monitoring of process operating performance. Can. J. Chem. Eng. 1991, 69, 35–47. [Google Scholar] [CrossRef]
- Wen, Q.; Ge, Z.; Song, Z. Nonlinear dynamic process monitoring based on kernel partial least squares. In Proceedings of the 2012 American Control Conference (ACC), Montreal, QC, Canada, 27–29 June 2012; pp. 2117–2122. [Google Scholar]
- Feng, S.; Cui, J.; Zhang, Z. Fault diagnosis method for an aerospace generator rotating rectifier based on dynamic FFT technology. Metrol. Meas. Syst. 2021, 28, 269–288. [Google Scholar] [CrossRef]
- Tutivén, C.; Figueres, E.; Trilla, L.; Ferrada, P.; Pozo, F. Early fault diagnosis strategy for WT main bearings based on SCADA data and one-class SVM. Energies 2022, 15, 4381. [Google Scholar] [CrossRef]
- Sun, X.; Yu, L.; Zhou, Y.; Li, Y.; Peng, X.; Ma, H.; Wang, H. Research on Fault Diagnosis Method of Distributed Power Distribution Network Based on HHT and CNN. In Proceedings of the 2021 33rd Chinese Control and Decision Conference (CCDC), Kunming, China, 22–24 May 2021; pp. 3682–3687. [Google Scholar]
- Han, H.; Jia, Q.; He, Q.; Li, Z.; Zhao, H. Novel chiller fault diagnosis using deep neural network (DNN) with simulated annealing (SA). Int. J. Refrig. 2021, 121, 269–278. [Google Scholar] [CrossRef]
- Li, S.; Shi, X.; Yang, X.; Ding, X.; Li, Y.; Zhang, W. A review on the signal processing methods of rotating machinery fault diagnosis. In Proceedings of the 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, 24–26 May 2019; pp. 194–198. [Google Scholar]
- Yasenjiang, J.; Zhao, Y.; Liu, Z.; Sun, H.; Liu, Y.; Han, Z. Fault Diagnosis and Prediction of Continuous Industrial Processes Based on Hidden Markov Model-Bayesian Network Hybrid Model. Int. J. Chem. Eng. 2022, 2022, 1–15. [Google Scholar] [CrossRef]
- Chen, J.X.; Shi, H.B. SVM-based Fault Diagnosis for Chemical Process. J. East China Univ. Sci. Technol. (Nat. Sci. Ed.) 2004, 30, 315–317. [Google Scholar]
- Yu, Y.; Cheng, J. A roller bearing fault diagnosis method based on EMD energy entropy and ANN. J. Sound Vib. 2006, 294, 269–277. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Xie, D.; Bai, L. A hierarchical deep neural network for fault diagnosis on Tennessee-Eastman process. In Proceedings of the 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 9–11 December 2015; pp. 745–748. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
- Shi, F.; Wang, Z.; Liang, J. Industrial Fault Identification Based on Semi-supervised Dense Ladder Network. Chem. Eng. J. 2018, 69, 3083–3091. [Google Scholar]
- Li, Y. A fault prediction and cause identification approach in complex industrial processes based on deep learning. Comput. Intell. Neurosci. 2021, 2021, 6612342. [Google Scholar] [CrossRef]
- Yuan, Z.; Yang, Z.; Ling, Y.; Wu, C.; Li, C. Spatiotemporal attention mechanism-based deep network for critical parameters prediction in chemical process. Process Saf. Environ. Prot. 2021, 155, 401–414. [Google Scholar] [CrossRef]
- Mnih, V.; Heess, N.; Graves, A. Recurrent models of visual attention. In Proceedings of the Advances in Neural Information Processing Systems 27 (NIPS 2014), Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 7–9 July 2015; pp. 448–456. [Google Scholar]
- Lee, M. GELU Activation Function in Deep Learning: A Comprehensive Mathematical Analysis and Performance. arXiv 2023, arXiv:2305.12073. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
- Lin, M.; Chen, Q.; Yan, S. Network in network. arXiv 2013, arXiv:1312.4400. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; pp. 1–775. [Google Scholar]
- Sifre, L.; Mallat, S. Rigid-motion scattering for texture classification. arXiv 2014, arXiv:1403.1687. [Google Scholar]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]
- Dai, H.; Bai, X.; Chen, J.; Du, Z.; Wang, L.; Zheng, X. SAMAug: Point Prompt Augmentation for Segment Anything Model. arXiv 2023, arXiv:2307.01187. [Google Scholar]
- Downs, J.J.; Vogel, E.F. A plant-wide industrial process control problem. Comput. Chem. Eng. 1993, 17, 245–255. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).