1. Introduction
In recent years, with the vigorous development of China’s coastal economic belt, large-scale land reclamation projects have also been steadily advancing. In this context, many stations have to be built in soft soil foundation areas along the coast. The high compressibility of soft soil foundation results in significant compressive deformation when subjected to surrounding structural loads; its rheological properties lead to the continuous development of soil deformation over time, and even if the load remains constant, the foundation will still settle [
1]. Ground settlement is mainly divided into uniform settlement and uneven settlement, which poses many hidden dangers to the safe operation of pipelines in the station due to geological movements. The natural gas pipeline explosion in the United States in 2010 and the pipeline burst in Dalian in 2014 were both caused by ground subsidence [
2].
During the operation of the oil pump in the station, the high-speed rotation of the impeller will generate strong mechanical vibration, which will be transmitted along the pipeline to the supporting structure, resulting in an increase in stress on the pipeline. When the vibration frequency of the oil pump meets specific conditions with the natural frequency of the pipeline, the oil pump pipeline system will undergo resonance response, and the vibration amplitude will significantly increase. Long-term resonance can easily cause cracks in pipelines, leading to medium leakage and even serious fire and explosion accidents, resulting in huge economic losses and casualties.
At present, pipeline risk warning technology is divided into physical model-based evaluation methods, data-driven intelligent warning methods, traditional monitoring and detection technologies, and comprehensive risk assessment and decision-making methods.
The evaluation technology based on physical models mainly evaluates the safety status of pipelines through residual strength assessment, stress–strain analysis, and leakage diffusion analysis. The current research status of pipeline safety assessment technology based on physical models is detailed in
Table 1.
The data-driven intelligent warning method mainly uses machine learning classification, time series prediction, and anomaly detection algorithms. In practical engineering applications, multiple methods are often combined to fully leverage the advantages of each method. These methods have evaluated the safety of station pipelines from different perspectives and achieved good results. The current research status of data-driven pipeline intelligent warning methods is detailed in
Table 2.
Traditional pipeline monitoring and detection technologies mainly include manual inspection, SCADA-based real-time operational monitoring, ultrasonic corrosion testing, and magnetic flux leakage inspection. These approaches enable the acquisition of basic pipeline status information, such as pressure, flow rate, temperature, and localized defect characteristics, and allow operators to identify abnormal fluctuations or apparent degradation phenomena.
However, in conventional applications, the data collected by SCADA systems and field sensors are primarily used for threshold-based alarming and post-event analysis, with limited capability for extracting latent correlations among multi-source variables or identifying complex, evolving risk patterns. As a result, although traditional systems provide essential raw data and first-level safety assurance, their potential for proactive risk assessment and early warning remains largely unrealized without the support of intelligent data-driven analysis methods.
The comprehensive method evaluation and risk warning generally adopt evaluation methods such as risk matrix method, accident tree analysis method, and analytic hierarchy process, covering various evaluation methods such as qualitative, semi quantitative, and quantitative. Wu Yanhua et al. [
12] established a railway disaster monitoring and early warning system based on the risk matrix method. Zhu Lina [
13] constructed a safety production accident risk warning model based on a risk index system combined with fuzzy evaluation. AYYILDIZ et al. [
14] constructed a warehouse risk warning indicator system based on fuzzy logic and Bayesian networks. Wang Xin [
15] established a real-time monitoring and early warning model for mine water inrush by constructing a BP neural network. Sun Yi et al. [
16] analyzed the various methods used in the safety warning process of storage tank areas.
In view of the difficulty in determining early warning levels for station oil pump pipeline systems under the coupled effects of multiple operating conditions and environmental factors, single-model approaches exhibit inherent limitations in feature representation and decision boundary construction, making it challenging to accurately characterize the complex nonlinear relationships governing warning level evolution. To address this issue, an ensemble learning framework is introduced to enhance the comprehensive representation of multi-source and multi-scale influencing factors. By integrating multiple base learners, ensemble learning effectively alleviates model bias and variance, thereby improving generalization performance in complex engineering scenarios.
On this basis, this study constructs an intelligent risk early warning model for station oil pump pipeline systems under soft soil foundation conditions by coupling the artificial bee colony (ABC) optimization algorithm with XGBoost. XGBoost (eXtreme Gradient Boosting) is an extreme gradient boosting tree algorithm, which belongs to a type of ensemble learning [
17]. The primary contribution of this work lies in the application-oriented integration of ABC-optimized XGBoost tailored to the risk warning problem of oil pump pipelines in soft soil environments, rather than in proposing a new learning algorithm. Specifically, the ABC algorithm is employed to adaptively optimize key hyperparameters of XGBoost, including the number of iterations, learning rate, and tree depth, thereby improving the model’s sensitivity to nonlinear coupling effects induced by soft soil foundation conditions [
18]. The proposed framework enables more accurate and robust early warning of coupling risks in station oil pump pipeline systems, demonstrating its effectiveness for intelligent safety assessment in complex geotechnical environments.
The second part of this article introduces the construction process of the coupled risk warning model for the oil pump pipeline system in the station under soft soil foundation conditions. The third part introduces the process of optimizing XGBoost using the ABC algorithm. The fourth part focuses on conducting case studies to verify the accuracy of the model.
2. Risk Early Warning Model for Coupling Risk of Oil Pump Pipeline System in Station Under Soft Soil Foundation Conditions
In order to avoid stress deformation of the oil pump pipeline coupling system under external loads and ensure safe operation of the station, accurate judgment of the real-time status of the system should be made, and a system coupling risk warning model should be established.
2.1. Establishment of Risk Warning Indicator System
2.1.1. Stress Warning Indicators
The warning level of pipeline stress can be determined by using the stress value on the pipeline. This article divides stress values into two categories: stress values obtained through on-site monitoring and stress values calculated through numerical simulation.
The settlement displacement of the pipeline is displayed by displacement sensors, and the strain at any point on the circumference of the pipeline can be obtained by Formula (1). By taking the derivative and using Formulas (2) and (3), the maximum and minimum strains of the pipeline can be determined.
: maximum tensile strain, : axial strain, : minimum compressive strain, , : direction vector.
Therefore, by converting monitoring values, the maximum stress of the pipeline section and the settlement displacement of the foundation can be obtained in real time. The maximum stress value on the pipeline cross-section reflects the actual strength state of the pipeline under current operating and geotechnical conditions.
The maximum von Mises stress of the pipeline is calculated by integrating real-time monitoring data, including oil pump vibration signals, internal pipeline pressure, and ground settlement displacement, transmitted via the on-site SCADA system from monitoring devices installed in the upstream section. These measured pressure and displacement data are used to dynamically update the boundary conditions of the numerical stress model, enabling time-synchronized stress estimation. To verify the consistency between monitoring data and numerical predictions, key stress-sensitive response indicators derived from vibration and pressure signals are compared with model-predicted responses.
The discrepancy between monitored responses and model-predicted stress is quantified using a normalized deviation indicator, defined as
: measured stress-sensitive feature at time , : corresponding response predicted from the simulated maximum von Mises stress , : a small positive constant to avoid numerical instability.
Based on the magnitude of the discrepancy, a confidence weight factor is introduced as
: a deviation sensitivity coefficient controlling the influence of data–model inconsistency. This weight is then used to regulate the contribution of the simulation-derived stress to the final risk assessment.
Specifically, the corrected risk index is expressed as
: baseline risk index obtained from the data-driven early warning model, : the stress-related risk component derived from the numerical simulation.
Through this mechanism, larger discrepancies between monitoring data and numerical predictions lead to a conservative adjustment of the stress contribution, thereby preventing underestimation of potential hazards. Under this integrated framework, the maximum von Mises stress obtained from numerical simulation conveys the theoretical stress state of the pipeline in real time and serves as a reliable supplementary stress indicator when on-site stress sensors are unavailable or temporarily fail. By explicitly quantifying data–model discrepancies and incorporating them into the risk evaluation process, the proposed approach reduces uncertainty propagation and enhances the robustness and credibility of pipeline safety assessment under complex operating conditions.
2.1.2. Resonance Warning Indicators
When the vibration frequency of the oil pump approaches the natural frequency of the pipeline, the oil pump pipeline system will produce a strong vibration response with a sharp increase in amplitude. Extract the natural frequencies of the oil pump and pipeline, calculate the frequency ratio between the oil pump and pipeline, reflect the resonance safety status of the system, and determine the resonance warning level.
2.1.3. Warning Level Determination
Obtain data on pipeline internal pressure, ground settlement displacement, pipeline amplitude, pipeline stress, vibration frequency of oil pumps, and pipeline vibration frequency to calculate stress warning level and resonance warning level. And any warning indicator reaching the warning value represents that the station oil pump pipeline system has met the conditions for failure. Therefore, the total warning level of the system is set to the highest level among the three warning indicators mentioned above. The calculation process is shown in
Figure 1.
Calculate the system warning level based on the resonance warning level and stress warning level calculated for each part using the following formula:
L: warning level, L1: resonance warning level, L2: monitoring stress warning level, L3: finite element calculation of stress warning level.
: Vibration frequency of fuel delivery pump, Hz; : The natural frequency of the pipeline, Hz.
Define frequency ratio
:
Based on the recommendations provided in ISO 10816 [
19] and API 618 [
20], the resonance risk is categorized into three levels based on the frequency ratio:
2.2. Soft Soil Foundation Station Oil Pump Pipeline Warning Model Based on ABC-XGBoost Algorithm
The overall process of the ABC-XGBoost soft soil foundation base station pipeline risk warning model is shown in
Figure 2.
3. Optimize Hyperparameters of XGBoost Model
The hyperparameters of the XGBoost model, including the maximum number of iterations, tree depth, and learning rate, are optimized using the global search capability of the artificial bee colony (ABC) algorithm. The search ranges of these hyperparameters are determined based on a combination of empirical experience from related studies and preliminary exploratory experiments, ensuring that the selected intervals cover both conservative and aggressive learning configurations while avoiding unrealistic parameter settings.
The population of the ABC algorithm is initialized such that each individual corresponds to a candidate set of XGBoost hyperparameters, and the maximum number of ABC iterations is set to 100 to balance optimization effectiveness and computational efficiency. During the iterative optimization process, employed bees, onlooker bees, and scout bees adjust the hyperparameter combinations according to their respective search strategies, and the updated parameters are subsequently applied to train the XGBoost model, forming the ABC–XGBoost risk early warning framework for station pipeline systems under soft soil foundation conditions.
To evaluate the robustness of the proposed model with respect to the selected hyperparameter ranges, a sensitivity analysis is conducted by moderately expanding and contracting the predefined intervals. The results indicate that the model performance exhibits limited sensitivity to small variations in the hyperparameter ranges, confirming the adequacy of the selected search spaces and the stability of the optimization process. The overall procedure of XGBoost hyperparameter optimization using the ABC algorithm is illustrated in
Figure 3.
In the process of building decision trees, XGBoost uses greedy algorithms to traverse features and splitting points, calculate the splitting method with the highest selection gain, and to process large-scale data and improve efficiency; it also uses feature parallel computing to increase the speed of data training.
XGBoost is an ensemble model based on classification regression trees, as shown in Equation (11):
In the formula, T: the number of trees; L: the number of leaf nodes in the tree; q: the structure of trees; w: leaf node weight vector.
As shown in Equation (12), the XGBoost objective function consists of a loss function and a regularization term. The loss term l is used to measure the difference between the predicted value and the target value, while the penalty term Ω can limit the model complexity and avoid overfitting.
The training adopts an iterative approach, and the iterative update method is shown in Equation (13):
Therefore, the objective function for predicting the value
of the t-th iteration of
is Formula (14), and the second-order Taylor expansion is shown in Formula (15):
Therefore, the objective function after removing the constant term is shown in Equation (16), the first derivative is shown in Equation (17), and the second derivative is shown in Equation (18):
Therefore, the objective function for the t-th iteration is
If
is defined as the set on each leaf node, then the problem of finding the best predicted value and optimal objective function for node j can be transformed into an extremum problem of a quadratic equation, as shown in Formula (20):
The gain function for segmentation is defined as Equation (21), and the gain is used to determine whether to segment and the candidate nodes for segmentation.
During the training process of the XGBoost model, hyperparameters such as iteration count, learning rate, and tree depth have a significant impact on the model’s predictive performance. If the number of iterations is too small, it can easily lead to underfitting of the model and insufficient learning of data features; on the contrary, if the number of iterations is too high, it may lead to overfitting of the model and overfitting to the noise and details in the training data, thereby reducing its generalization ability. The learning rate is too low, and the model steps slowly during gradient descent, resulting in a significant decrease in convergence speed; when the learning rate is too high, the model is prone to crossing the optimal value during gradient descent, making it difficult to converge to the ideal solution.
The ABC optimization algorithm consists of five steps.
- (1)
Generate initial population:
In the formula, : the sequence number of the solution; : dimensions of solution; : the upper and lower limits of the -th dimension of the decision variable; Rand (0,1): a uniformly distributed random number in the closed interval [0, 1].
- (2)
Dispatch hired bees to search near the honey source and generate a new honey source according to Equation (20):
In the formula, : the new solution generated by the neighborhood of the current solution; : a randomly selected solution from the population; : scalability factor.
- (3)
Follow the bees through roulette wheel to select promising honey sources for further search. See Equations (21) and (22):
- (4)
When the honey source has not been updated after multiple iterations, it is considered to have been mined and a new honey source is initialized in space using Equation (18).
- (5)
In the termination condition stage, if the maximum number of iterations is reached or the fitness converges to a certain accuracy, the iteration is stopped. Otherwise, steps 2 to 5 are repeated.
4. Case Study
4.1. Dataset Preparation
The selection of input and output parameters is crucial when building predictive models. The correct input parameters can improve the predictive ability of the model, while the output parameters are the target of model prediction. Considering that the warning level characteristics are derived from measured data at the station, the dataset is formed by selecting internal pressure, amplitude, ground settlement, stress 1, stress 2, stress 3, oil pump vibration frequency, and pipeline natural frequency as inputs, and safety level as output parameters. Encode the security level as shown in
Table 3.
4.2. Pipeline Model and Dataset Construction
According to the actual operating conditions of the station, a multi-factor finite element model is established by explicitly incorporating the soft soil foundation, support pier constraints, pipeline structure, and oil pump loading. The model is employed to simulate the stress state of the pipeline under three representative settlement scenarios, including uniform foundation settlement, local settlement of the buried pipeline foundation, and localized settlement at the pipeline–oil pump connection.
To ensure the reproducibility and numerical reliability of the simulation results, the finite element model is constructed using structured meshing for the pipeline and refined local meshes in settlement-sensitive regions, while appropriate boundary conditions are applied to represent support pier constraints and soil–structure interaction. The oil pump load and internal pressure are imposed as external and internal loads, respectively. Mesh convergence is verified by monitoring the stability of the maximum von Mises stress with progressive mesh refinement, and the solution is accepted when further refinement results in negligible stress variation. Based on the validated model, characteristic stress data are extracted for subsequent analysis.
Calculate the risk warning level of station pipeline under different states using on-site feature data. Construct 200 pipeline models from the input parameter set, and combine them with other parameters to create a total of 766 sets of data. The feature data sample table is shown in
Table 4.
In
Table 4, feature 1 represents pipeline frequency, feature 2 represents oil pump frequency, feature 3 represents internal pressure, feature 4 represents amplitude, feature 5 represents settlement, feature 6 represents stress data 1, feature 7 represents stress data 2, and feature 8 represents stress data 3.
4.3. Data Training
In order to eliminate dimensional differences and accelerate model convergence, the feature data is normalized. Randomly allocate 70% of the dataset as the training set and 30% of the dataset as the testing set. The normalized dataset was input into the XGBoost-based risk warning model for training, with a training set accuracy of 100% and a testing set accuracy of 95.21%, as shown in
Figure 4. By using the artificial bee colony optimization algorithm to optimize the hyperparameters of the XGBoost model, the hyperparameters of the bee colony optimization algorithm and the training results of the ABC XGBoost risk warning model were obtained, as shown in
Figure 5.
X-axis: predicted samples,
Y-axis: predicted results.The detailed comparison of the training effectiveness of risk early warning models is presented in
Table 5.
4.4. Result Analysis
The training accuracy of the risk warning model based on ABC–XGBoost reaches 95.22%, representing an improvement of 2.61% compared with the unoptimized XGBoost-based model, which demonstrates the effectiveness of hyperparameter optimization. As shown by the confusion matrix, the training dataset consists of 536 samples with an overall classification accuracy of 100%, while the test dataset contains 230 samples with an accuracy of 95.22%. Specifically, two safety samples are misclassified as warning states; among the warning samples, one is misclassified as safe and two are correctly identified as danger; among the danger samples, two are misclassified as safe and four are correctly identified.
The multi-factor finite element model constructed based on the actual operating conditions of the station effectively captures the stress and response characteristics of the pipeline system under the coupled effects of soft soil foundation, pier constraints, pipeline structure, and oil pump loading. The multidimensional feature data extracted under different settlement scenarios—including uniform foundation settlement, local settlement of buried pipeline sections, and localized settlement at the pipeline–oil pump connection—provide a reliable data basis for the training and validation of the subsequent risk warning models. By normalizing frequency, internal pressure, vibration amplitude, settlement, and multi-source stress features, the influence of dimensional inconsistency on model training is eliminated, thereby improving model convergence and numerical stability.
The initial XGBoost-based risk warning model exhibits strong fitting capability on the training dataset and achieves high recognition accuracy on the test dataset, indicating that the selected feature set can effectively characterize the pipeline risk state. On this basis, the introduction of the artificial bee colony algorithm for global hyperparameter optimization further enhances the classification performance of the ABC–XGBoost model, leading to a notable improvement in test accuracy and overall recognition capability.
An analysis of the classification errors provides additional engineering insights. The misclassification between safety and warning states is mainly attributed to the overlap of feature distributions under early-stage abnormal conditions, where variations in vibration amplitude and internal pressure remain within moderate ranges and have not yet resulted in significant stress concentration. Similarly, the confusion between warning and danger states in a small number of samples is primarily caused by localized settlement conditions, under which the stress response exhibits strong nonlinearity and temporal variability, making the boundary between warning and danger states less distinct. Furthermore, uncertainties in monitoring data, such as sensor noise and short-term fluctuations in operating conditions, may also contribute to occasional misjudgments.
Despite these misclassifications, the overall error rate remains within an acceptable range for engineering applications. Overall, the proposed ABC–XGBoost risk warning model demonstrates good generalization ability and practical applicability under multi-condition and multi-feature coupling scenarios, and can provide effective technical support for graded early warning and safety management of pipeline operational risks in station environments.
5. Conclusions
This study proposes an intelligent risk early warning model for oil pump and pipeline systems in soft soil foundation stations, taking into account the characteristic operating and geotechnical conditions of such environments. By employing the artificial bee colony (ABC) algorithm to optimize key hyperparameters of the XGBoost model, including the maximum number of iterations, tree depth, and learning rate, the prediction performance of the risk warning model is effectively improved, enabling reliable discrimination of risk warning levels for oil pump pipeline systems under complex operating conditions.
From an engineering application perspective, the proposed model integrates numerical simulation results, finite element stress responses, and real-time monitoring data to support data-driven safety state identification and early warning of pipeline systems during operation. It should be noted that the current study is based on a relatively limited dataset, which may introduce potential risks of overfitting and constrain the generalization capability of the model. However, through parameter optimization and validation procedures, the model demonstrates stable performance and a consistent ability to capture the dominant risk patterns associated with soft soil foundation effects and multi-condition coupling.
At the present stage, the size of the dataset primarily affects the reliability of early warning in boundary cases and rare operating conditions, where insufficient samples may limit the model’s sensitivity to extreme or evolving risk states. Nevertheless, the results indicate that the proposed framework is suitable for feasibility validation and comparative risk identification under existing data conditions. With the continuous accumulation of long-term monitoring data and the enrichment of operational scenarios, the robustness and reliability of the model are expected to be further enhanced, allowing the framework to evolve toward a more comprehensive lifecycle-oriented safety assessment. Ultimately, this study provides a practical and extensible technical basis for intelligent safety management and risk early warning of oil pump pipeline systems in station environments.