4.1. Comparative Analysis of Sample Balancing Methods
To verify the model’s capability in managing imbalanced datasets and explore the importance of data balancing in the process of the health status assessment, this paper designs a comparative experiment from a data processing perspective. The experiment users three different datasets: the first dataset is the raw data without any preprocessing, the second dataset is balanced using random over-sampling, and the third dataset is balanced using the SMOTE. These three datasets, serving as the sole variable, are fed into the model for both training and testing purposes.
Table 5 presents the detailed results of the health status assessment for the three different datasets.
The results clearly show that using the SMOTE sampling improved the model’s accuracy. The accuracy of the air compressor health assessment using the SMOTE sampling for the test set reaches 96.89%, while the accuracy with random over-sampling is 93.65%, and the accuracy using the imbalanced dataset is only 86.01%.
When the original data are normalized and directly divided for input into the health status assessment model, the model struggles to generalize effectively to the “faulty” category because of the insufficient number of samples for that category in the original dataset, leading to the worse evaluation performance. This is because, with an imbalanced dataset, the model tends to prioritize learning the more abundant classes (such as the “healthy” category) while neglecting the less represented classes (such as the “faulty” category).
Random over-sampling balances the class distribution by simply duplicating minority class samples, enabling the model to learn and capture relevant features from the minority class. Compared to the original dataset, the performance of the model improved by 8.88% when tested on the over-sampled dataset. However, simply replicating minority class samples may cause the model to repeatedly encounter the same samples, preventing it from learning diverse features. Additionally, random over-sampling can lead to biased sample distributions by excessively increasing the number of minority class samples, which not only fails to improve the model’s learning of the minority class but may also degrade the model’s performance and stability in real-world applications.
In contrast, the SMOTE offers distinct advantages. The SMOTE does not simply replicate existing minority class samples; instead, it creates new synthetic samples by interpolating between neighboring minority class samples in the feature space. Unlike simple replication methods, this approach enhances the diversity of the minority class data, providing more samples that enable the model to more effectively learn the distinctive features of the minority class. After applying the SMOTE, the accuracy of the air compressor health status assessment improved by 12.65%.
4.2. Analysis of the Results of the Air Compressor Health Assessment Model Based on the SMOTE-IVY-SE-CNN-BiLSTM
This section presents an in-depth analysis of the performance of the air compressor health assessment model based on the SMOTE-IVY-SE-CNN-BiLSTM. After training the model, we evaluate its performance by applying the test set features to multiple models. Along with the approaches introduced in this study, seven other algorithms, namely, the Backpropagation Neural Network (BP) [
31], Particle Swarm Optimization–Bidirectional Long Short-Term Memory network (PSO-BiLSTM) [
32], Random Forest (RF) [
33], Support Vector Machine (SVM) [
34], BiLSTM, LSTM, and IVY-BiLSTM are selected as comparison models.
The BP adjusts the weights and biases using the backpropagation algorithm to optimize classification performance. The RF enhances classification stability and accuracy by combining multiple decision trees and employing a voting mechanism. The SVM, on the other hand, identifies the optimal hyperplane to maximize the margin between classes and can address nonlinear classification challenges through the use of kernel functions. The PSO-BiLSTM classification model combines the PSO algorithm with the BiLSTM network to improve the performance of classification tasks by optimizing the hyperparameters of the BiLSTM. The PSO-BiLSTM model utilizes the max iteration of 10 and the particle swarm population size of 20. The range of the learning rate, hidden layer node, and L2 regularization coefficient are , 10–30, and , respectively. The parameters of the IVY-BiLSTM model are same as the SMOTE-IVY-SE-CNN-BiLSTM.
The performance of these models are compared in
Table 6.
In
Table 6, the evaluation metrics for all models include Kappa value, F1 score, and accuracy, which reflect the models’ performance in the health status assessment task of air compressors.
In the comparison between Model 1 and Model 5, the BiLSTM performs the best. The BiLSTM, with its bidirectional structure, can capture and utilize the contextual relationships in the data more effectively, allowing for a deeper understanding of the multi-dimensional features of the input, thereby improving classification accuracy. In contrast, the BP, SVM, and RF, due to their simpler architectures, lack deep feature extraction capabilities and are more prone to overfitting. As a result, they perform worse than the BiLSTM in handling complex health status classification tasks. The LSTM, with its unidirectional structure, performs slightly worse than the BiLSTM. In the comparison between the PSO-BiLSTM (Model 6) and the IVY-BiLSTM (Model 7), the IVY optimization algorithm demonstrates its advantages. The IVY optimization improves the performance of the BiLSTM model by making more extensive adjustments globally and performing detailed optimization at the local level. This is reflected in the fact that the IVY-BiLSTM outperforms the PSO-BiLSTM in terms of Kappa value, F1 score, and accuracy.
Compared to these, it is evident that the SMOTE-IVY-SE-CNN-BiLSTM model exhibits advantages in both during the training and testing phases. On the test set, the Kappa value of the SMOTE-IVY-SE-CNN-BiLSTM is 0.9799, F1 score is 0.9773, and accuracy is 0.9722. The higher Kappa value indicates stronger consistency between the model’s classifications and the actual labels, suggesting greater stability and trustworthiness for the model in real-world applications. The higher F1 score means the model can not only maintain high precision but also identify potentially overlooked anomalies. The advantage of the accuracy demonstrates that the SMOTE-IVY-SE-CNN-BiLSTM has a stronger discriminative ability in classification tasks, enabling more accurate health status assessments of air compressors and reducing misclassifications.
Following the foundational comparative analysis of the air compressor health assessment model, this study systematically designed noise interference contrast experiments by injecting Gaussian white noise (standard deviation: 5% of the baseline signal amplitude) into raw multi-source sensor time-series data. This controlled noise injection effectively replicates potential interference scenarios in real-world industrial environments. The experiments not only quantify the impact of noise on diagnostic outcomes but also demonstrate the model’s strong anti-interference capabilities, thereby providing a foundation for future optimizations to enhance the performance of health state classification models.
Table 7 compares the performance of various models on the test set with 5% added noise. As shown in the table, under 5% noise conditions, our model demonstrates strong anti-interference capability, achieving accuracy, Kappa coefficient, and F1 score of 92.13%, 91.19%, and 95.06%, respectively. These metrics surpass those of other algorithms, including the BP, SVM, RF, LSTM, BiLSTM, and PSO-BiLSTM. Compared to its performance under noise-free conditions, the degradation magnitude is relatively small. This indicates that in the presence of noise interference, our model can effectively suppress noise-induced disturbances to classification decision boundaries through three key mechanisms: the SMOTE-enhanced sample balance, the SE-CNN feature selection mechanism for improving signal-to-noise ratio, and the dynamic feature perception capability of the IVY-BiLSTM. These findings provide reliability assurance for online diagnostic systems in industrial environments with complex noise conditions, while establishing an extendable technical pathway for future anti-interference optimization.
Overall, the SMOTE-IVY-SE-CNN-BiLSTM model demonstrates better performance in air compressor health status assessment, with outstanding results in the three key metrics: Kappa value, F1 score, and accuracy. We believe that this achievement is mainly attributed to the effective execution of each module in the model. The SMOTE module, as the foundation of the model, effectively increases the minority class samples, promoting data class balance and thereby improving the model’s capability to classify minority classes. The SE-CNN module is essential for extracting both local and global features. It not only effectively expands the model’s perceptual range to better facilitate global feature modeling but also significantly boosts the model’s capacity to capture intricate features. This is particularly important when handling high-dimensional and complex structured input data. Without this module, redundant information could interfere with the model’s accurate analysis of features, which would reduce its performance. The introduction of the IVY optimization algorithm further optimizes the hyperparameters of the BiLSTM model, enabling it to learn features from the data more efficiently. Compared to traditional hyperparameter selection methods, the IVY optimization algorithm explores the hyperparameter space more comprehensively, selecting the most suitable hyperparameter configuration for the current task.
Through the collaborative effort of these modules, our model not only performs excellently in data preprocessing, feature extraction, and model optimization but also achieves outstanding performance in the final health status evaluation task. This underscores the critical role each module plays in the model, collectively contributing to the overall enhancement of performance.
To substantiate this claim, we will present the ablation study in the following section.
4.3. Ablation Study
To validate the effectiveness of the proposed improvements in the air compressor health status assessment task, particularly in improving accuracy, an ablation study is conducted. To assess the contribution of each component to the overall performance, we conduct an ablation study by selectively removing key modules. The five models—IVY-SE-CNN-BiLSTM, SE-CNN-BiLSTM, CNN-BiLSTM, and BiLSTM—allowed for a detailed evaluation of the impact of each module on the model’s effectiveness. Specifically, the study tests the impact of the feature extraction module CNN, the Ivy algorithm, and the SE attention mechanism on model performance.
As shown in
Table 8, by comparing the training and testing results of different models, the impact and enhancement of each module on model performance can be clearly observed.
The IVY-SE-CNN-BiLSTM model demonstrates strong performance in the air compressor health status assessment task.
However, after removing the IVY optimization, the model’s performance significantly deteriorates. Although the BiLSTM can still effectively process data, the lack of precise hyperparameter tuning provided by the IVY algorithm reduces the model’s adaptability and generalization ability when facing complex tasks, resulting in greater fluctuations in the stability of the test results. The accuracy on the test set decreased by 3.35% after removing the IVY optimization, highlighting the significant impact of hyperparameter tuning on improving model performance.
In subsequent experiments, removing the SE module further degraded performance. The SE module plays a critical role in dynamically recalibrating the significance of feature weights; without it, the model lost its ability to prioritize the most relevant features.
Finally, when the CNN module was removed, the model lost its powerful feature extraction capability, which made it weaker when handling complex patterns in the data. Without the CNN module, the model struggled to capture critical local features from the input signals, causing the accuracy on the test set to decrease by 1.41%. This result further emphasizes the crucial role of feature extraction in enhancing the model’s recognition capability and handling complex data.
In conclusion, the ablation experiments clearly highlight the key role of each module in the model. Although the introduction of the SE module contributed to an improvement in overall performance, its effect on performance enhancement was relatively minor within the SE-CNN module combination. Therefore, future research could explore the incorporation of more efficient attention mechanisms to further improve the model’s ability in feature selection and representation. This direction provides important guidance for future work.