Taking the external corrosion data of the West–East Gas Pipeline buried pipeline as an example, and based on [
28,
29] and related materials, ten factors are compiled as indicators influencing the external corrosion rate Y: pH value (X
1), redox potential (X
2), soil resistivity (X
3), water content (X
4), bulk density (X
5), dissolved chloride (X
6), bicarbonate (X
7), sulfate (X
8), pipe-to-soil potential (X
9), and service years (X
10). A total of 110 datasets are selected for this study [
30], with partial data presented in
Table 2.
4.1. FA-Based Dimensionality Reduction
Pipeline corrosion results from the interaction of multiple influencing factors. To account for these interactions, Factor Analysis (FA) was performed using SPSS software (version 25.0; IBM Corp., Armonk, NY, USA) for analysis and dimensionality reduction. First, the Kaiser–Meyer–Olkin (KMO) measure and Bartlett’s test of sphericity were applied to the original corrosion data. The KMO test yielded a value of 0.654, and Bartlett’s test produced a significance level below 0.001, confirming the suitability of FA for dimensionality reduction. Subsequently, eigenvalues were calculated to determine the number of common factors, which were extracted using the principal component method. Varimax rotation was then applied to maximize variance interpretation. Finally, factor scores for each common factor were computed, resulting in the FA-processed dataset. The total variance explained during the process is presented in
Table 3, the component matrix is shown in
Table 4, and the coefficient matrix is provided in
Table 5.
According to the principal component extraction criterion, the cumulative contribution rate should exceed 85% [
18]. Based on
Table 3, four principal components with eigenvalues greater than 1 were extracted, yielding a cumulative variance contribution rate of 69.428%, which does not meet the extraction criterion. Therefore, a fixed number of factors was adopted for extraction. By setting the extraction to seven principal components, the cumulative contribution rate reached 88.525%, satisfying the extraction criterion. Among them, X
2 and X
10 are attributed to F
1, X
5 and X
6 to F
2, X
1 to F
3, X
7 and X
8 to F
4, X
9 to F
5, X
3 to F
6, and X
4 to F
7.
The extraction results of FA were calculated based on
Table 5, with partial data presented in
Table 6.
4.2. Model Training and Validation
F
1 to F
7 are used as model inputs, with the corrosion rate used as the output. From the dimensionality-reduced dataset of 110 samples, the data were partitioned as described in
Section 3.1, with 99 samples (90%) allocated for training and the remaining 11 samples (10%) reserved for testing. The parameters of the IDBO during the training process were set as follows: population size (pop) = 30, dimension (dim) = 2, maximum iterations (
Tmax) = 50, lower bound (lb) = [1 × 10
−2, 1 × 10
−2], and upper bound (ub) = [50, 50]. The kernel function of the KELM model was the radial basis function (RBF) Gaussian kernel. Through 50 iterations of optimization runs, the optimal penalty coefficient (C) and kernel parameter (γ) for the KELM model were determined. The optimal parameters are listed in
Table 7 and subsequently substituted into the KELM model for corrosion rate prediction.
To validate the prediction performance of this intelligent optimization algorithm model, comparisons were made with the FA-KELM, FA-SSA-KELM, and FA-DBO-KELM models. All four models utilize intelligent algorithms to optimize the key parameters of KELM. To ensure fairness, the same dataset was employed for all algorithm models, with the population size set to 30 and maximum iterations set to 50. The prediction results and comparisons of the four models are presented in
Table 8 and
Figure 2.
As shown in
Figure 2 and
Table 8, for predicting of external corrosion rates of buried pipelines, the FA-IDBO-KELM model achieves the lowest relative prediction errors across all cases except for the 10th data group. Among the models, FA-KELM demonstrates the poorest prediction performance, with a maximum error of 116.2789% and a minimum error of 5.6519%. The FA-SSA-KELM model follows, exhibiting a maximum error of 98.4154% and a minimum error of 1.0093%. The FA-DBO-KELM model is ranked next, with a maximum error of 66.3875% and a minimum error of 12.5183%. In comparison, the FA-IDBO-KELM model delivers more stable prediction results, with a maximum error of 7.3057% and a minimum error of 1.4092%.
As shown in
Figure 2, compared with the FA-KELM model, the intelligent optimization algorithm demonstrates a significant enhancement of the model’s performance. In contrast to the FA-SSA-KELM and FA-DBO-KELM models, the FA-IDBO-KELM model provides predictions closer to the actual values, confirming that the improved IDBO effectively improves prediction accuracy. Therefore, compared to the three other models, the FA-IDBO-KELM model exhibits superior performance on the test data.
The prediction results and performance metrics are summarized in
Table 8 and
Table 9. As illustrated in
Figure 2, the FA-IDBO-KELM model demonstrates predictions that most closely align with the actual values, exhibiting the smallest relative errors across most test samples. Statistical metrics were further employed to evaluate the accuracy of the FA-IDBO-KELM model, with the calculated results presented in
Table 9. As can be seen, compared with the FA-KELM, FA-SSA-KELM, and FA-DBO-KELM models, the FA-IDBO-KELM model reduces the root mean square error (E
RMSE) by 3.73%, 2.3%, and 1.63%, respectively; reduces the mean absolute error (E
MAE) by 3.34%, 2.1%, and 1.56%, respectively; and increases the coefficient of determination (R
2) by 45.93%, 9.57%, and 6.84%, respectively. This demonstrates that, for predicting the external corrosion rates of buried pipelines, the combined FA-IDBO-KELM model outperforms the other three models, thereby verifying that the algorithmic improvements effectively enhance the model’s performance.
To provide a visual comparison of the prediction errors across different models, a box plot illustrating the distribution of relative errors for the test set is presented in
Figure 3. The plot clearly shows that the FA-IDBO-KELM model has the smallest interquartile range and median error, indicating its superior stability and accuracy.
The FA-IDBO-KELM model exhibits the smallest interquartile range (IQR) and the lowest median error. This indicates that its prediction errors are not only low on average but are also consistently tightly clustered around the median. This high level of stability is paramount for practical engineering applications, where reliable and repeatable predictions are required for risk assessment and decision-making. In contrast, the FA-KELM model shows the largest IQR and highest median error, reflecting its unstable and unreliable performance. The FA-SSA-KELM and FA-DBO-KELM models demonstrate intermediate performance, but their wider boxes and higher median errors indicate a greater susceptibility to producing variable results, likely due to premature convergence or inadequate exploration of the search space by the standard optimizers. The compact distribution of errors for FA-IDBO-KELM also suggests a lower number and magnitude of outliers compared to other models. This enhanced robustness signifies that the model is less sensitive to noise or anomalies in the dataset, a common challenge with models that overfit or have unstable optimization processes.
The proposed FA-IDBO-KELM model demonstrates a marked superiority over the benchmark models. The FA-KELM model, lacking intelligent hyperparameter optimization, exhibits the highest prediction errors (e.g., Max Error: 116.28%), underscoring the critical need for optimizing KELM parameters (C, γ). The FA-SSA-KELM and FA-DBO-KELM models show improved performance, yet their susceptibility to local optima and uneven exploration—exploitation balance, as discussed in the
Section 1, results in significantly higher errors compared to our model. In contrast, the FA-IDBO-KELM model achieves predictions closest to the actual values, with a maximum error of only 7.31%. This performance improvement can be attributed to the synergistic enhancements in IDBO—SPM initialization, spiral search, Lévy flight, and adaptive t-mutation, which collectively foster a more robust search strategy, effectively mitigating premature convergence and enhancing global search capability.
The exceptional stability observed in
Figure 3 is a direct consequence of the improvements embedded within the IDBO algorithm. The SPM initialization ensures a diverse starting population, the spiral search and Lévy flight strategies work in tandem to balance global exploration and local exploitation, effectively escaping local optima, and the adaptive t-distribution mutation provides a final refinement mechanism. This synergistic combination prevents premature convergence, a typical pitfall of the standard DBO and SSA algorithms, leading to consistently superior and reliable parameter optimization for the KELM model.
4.3. Ablation Study and Robustness Analysis
To quantitatively evaluate the contribution of each enhancement strategy in IDBO and assess model robustness, a comprehensive ablation study was conducted. The baseline model (FA-DBO-KELM) was progressively enhanced. Each variant was evaluated over 30 independent runs, and the results (mean ± standard deviation) are summarized in
Table 10. The progressive and statistically significant improvement (verified via the Friedman test with Nemenyi post hoc analysis,
p < 0.001) across all metrics confirms that each enhancement addresses specific limitations of the standard DBO. SPM initialization significantly improved stability (reduced the RMSE standard deviation by 34.8%), the spiral search enhanced local exploitation (improved R
2 by 2.1%), the Lévy flight mechanism substantially boosted global exploration (reducing RMSE by 55.6% in a single step), and the adaptive t-distribution mutation provided the final refinement. This systematic analysis robustly grounds the superior performance of the full FA-IDBO-KELM model.
The ablation study reveals several key insights:
(1) SPM-based population initialization contributed most significantly to stability improvement, reducing RMSE standard deviation by 34.8% compared to the baseline (from ±0.0023 to ±0.0015). This demonstrates that uniform initial population distribution is crucial for consistent optimization performance.
(2) The spiral search strategy enhanced local exploitation capability, improving R2 by 2.1% (from 0.9583 to 0.9781) while maintaining similar variance levels. This confirms its effectiveness in preventing premature convergence.
(3) The Lévy flight mechanism substantially boosted global exploration, achieving the most significant single-step improvement in RMSE (55.6% reduction from 0.0087 to 0.0043). The long-distance jumps effectively helped with escaping local optima.
(4) Adaptive t-distribution mutation provided the final refinement, further reducing RMSE by 34.9% and achieving near-perfect R2 (0.9954). The mutation operator’s adaptive nature, with degrees of freedom linked to iteration count, ensured balanced exploration-exploitation throughout the optimization process.
The progressive improvement identified across all metrics confirms that each enhancement strategy addresses specific limitations of the standard DBO algorithm, and their synergistic combination in IDBO delivers optimal performance.
Statistical significance was verified using the Friedman test with Nemenyi post hoc analysis at α = 0.05. The results showed significant differences among all variants (
p < 0.001), confirming that each enhancement contributes uniquely to the overall performance improvement. The critical difference diagram (
Figure 4) visually demonstrates that FA-IDBO-KELM forms a distinct performance cluster separate from all other variants.
The statistical significance of the performance differences was further validated using the Wilcoxon signed-rank test on the prediction errors, confirming that the FA-IDBO-KELM model significantly outperformed all benchmarks at a 95% confidence level (p-value < 0.05). Additionally, the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) was employed as a Multi-Criteria Decision Making (MCDM) method for holistically evaluating the models based on RMSE, MAE, and R2. The TOPSIS closeness coefficients were calculated as 0.892 for FA-IDBO-KELM, 0.456 for FA-DBO-KELM, 0.312 for FA-SSA-KELM, and 0.105 for FA-KELM, unequivocally ranking the proposed model as the best choice and providing a comprehensive justification for the conclusions drawn.
The superior performance of the FA-IDBO-KELM model can be attributed to the synergistic effect of its components. The FA effectively reduces multicollinearity and noise among the input variables. More importantly, the IDBO algorithm successfully optimizes the KELM parameters by achieving a more symmetric balance between exploration (global search) and exploitation (local refinement). The SPM and spiral search ensure that the comprehensive exploration of the parameter space occurs, while the Lévy flight and t-distribution mutation provide effective mechanisms to jump out of local optima, which is a common pitfall for the standard DBO and SSA algorithms. This leads to the identification of a more optimal (C, γ) parameter pair (as shown in
Table 7), which, in turn, enables the KELM model to achieve higher generalization ability and avoid overfitting. The extremely high R
2 value (0.9954) indicates that the model explains almost all the variability in the corrosion rate data, making it highly suitable for practical risk assessment applications where precision is critical.
The computational efficiency of the proposed FA-IDBO-KELM model was rigorously evaluated to assess its practical viability. The time complexity of the Improved Dung Beetle Optimizer (IDBO) is O (pop × Tmax × dim), which is comparable to other population-based algorithms like the SSA and standard DBO. Experimental results obtained on a standard desktop PC (Intel i7-10700K, 32 GB RAM) demonstrated that the FA-IDBO-KELM model required an average runtime of approximately 12.5 s for 50 iterations. This computational cost is acceptable for practical corrosion prediction tasks. A comparative analysis revealed that while the model incurs a longer processing time than the non-optimized FA-KELM (≈3.2 s), it is more efficient than both the FA-SSA-KELM (≈15.1 s) and the baseline FA-DBO-KELM (≈13.8 s) models. Crucially, when compared to the FA-DBO-KELM baseline, the full FA-IDBO-KELM model introduced only a 15.2% time overhead (12.5 s vs. 10.8 s) while achieving a substantial 85.3% reduction in RMSE. This favorable trade-off between a modest increase in computational cost and a significant gain in predictive accuracy underscores the practical value of the proposed enhancements for real-world applications where both model precision and operational efficiency are critical considerations.
To further validate the statistical significance of the performance differences, the Wilcoxon signed-rank test was conducted on the prediction errors of the four models. The results indicated that the FA-IDBO-KELM model significantly outperformed the other models at a 95% confidence level (p-value < 0.05). Additionally, the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) was employed as a Multi-Criteria Decision Making (MCDM) method to comprehensively evaluate the models based on RMSE, MAE, and R2. The TOPSIS scores (closeness coefficients) were calculated as 0.892 for FA-IDBO-KELM, 0.456 for FA-DBO-KELM, 0.312 for FA-SSA-KELM, and 0.105 for FA-KELM, clearly identifying the proposed model as the best choice.
In summary, the results demonstrate that the FA-IDBO-KELM framework is not only highly accurate but also remarkably stable and robust. The synergistic combination of Factor Analysis for dimensionality reduction and the Improved Dung Beetle Optimizer for parameter tuning effectively addresses the common limitations of existing models, namely low accuracy, instability, and overfitting. The empirical evidence, supported by rigorous statistical testing, strongly validates the proposed model as a superior tool for predicting the external corrosion rate of buried pipelines.