Prediction of Corroded Pipeline Failure Pressure Based on Empirical Knowledge and Machine Learning

Hongbo Liu; Xiangzhao Meng

doi:10.3390/app15105787

and

¹

School of Human Settlements and Civil Engineering, Xi′an Jiaotong University, Xi’an 710049, China

²

Shannxi Provincial Natural Gas Co., Ltd., Xi′an 710016, China

^*

Author to whom correspondence should be addressed.

Appl. Sci.2025, 15(10), 5787;https://doi.org/10.3390/app15105787

Version Notes

Order Reprints

Abstract

This paper presents a novel approach for predicting the failure pressure of corroded pipelines by integrating empirical formulas into the loss function of a neural network-based prediction model. Traditional empirical formulas, such as ASME-B31G, DNV RP-F101, and PCORRC, have been widely used for their simplicity but often suffer from significant prediction errors due to the complex interactions between defect parameters and material properties. In contrast, artificial neural networks (ANNs) offer more accurate predictions but require substantial training data. To address these limitations, we propose an integrated loss function that combines the strengths of empirical formulas and the powerful fitting capabilities of ANNs. The proposed loss function incorporates an additional defect factor term predicted by the neural network to compensate for errors caused by varying defect conditions, thereby enhancing the model′s adaptability and accuracy. The model is trained using a diverse dataset of 60 burst test results from various literature sources, covering a wide range of corrosion scenarios. The results demonstrate that the proposed method significantly improves prediction accuracy compared to traditional empirical formulas and ANN models trained with standard loss functions. The proposed approach achieves a mean absolute percentage error (MAPE) of 2.52%, a root mean square error (RMSE) of 0.39 MPa, and a coefficient of determination (R²) of 0.9886 on the validation set. This study highlights the effectiveness of integrating empirical knowledge with data-driven models and provides a robust and accurate solution for predicting the failure pressure of corroded pipelines, contributing to enhanced pipeline integrity assessment and safety management.

Keywords:

corroded pipeline; failure pressure prediction; empirical formula; loss function

1. Introduction

Empirical formulae, such as ASME-B31G [1], DNV RP-F101 [2], and PCORRC [3], have been widely utilized for predicting the failure pressure of oil and gas defective pipelines due to their simplicity and ease of application. These formulae are primarily derived from theoretical analysis and burst test data, providing sufficient samples for practical engineering assessments. The ASME-B31G formula, for instance, offers a modified approach to address the over-conservatism of traditional standards by incorporating factors like defect depth and length [4]. Similarly, the DNV RP-F101 standard evaluates failure pressure based on sectional safety factors and allowable stress, demonstrating broader applicability and less conservative predictions [5]. The PCORRC formula, focusing on plastic collapse failure, is particularly suited for pipelines with moderate to high toughness. However, despite their widespread use, empirical formulae suffer from notable limitations. They often exhibit significant prediction errors, which can lead to either overly cautious or insufficiently conservative estimates. Additionally, these formulae may not accurately capture the complex interactions between various defect parameters and material properties, thereby compromising their reliability in certain scenarios. Consequently, while empirical formulae offer a rapid and convenient means of assessing pipeline integrity, their inherent inaccuracies necessitate the exploration of more sophisticated and precise methods.

Artificial neural networks (ANNs) have emerged as powerful tools for predicting the failure pressure of pipelines with interacting corrosion defects, offering more accurate and less conservative results compared to traditional methods [6]. In recent studies, ANNs have been successfully integrated with finite element analysis (FEA) to develop empirical equations that account for complex interactions between corrosion defects and combined loadings [7]. Specifically, Vijaya Kumar et al. utilized an ANN trained on extensive FEA data to derive an empirical equation for predicting the failure pressure of API 5L X52, X65, and X80 pipelines subjected to internal pressure and axial compressive stress [8]. The study demonstrated that defect depth has the most significant impact on failure pressure, followed by defect length and axial compressive stress. Xu et al. utilized ANNs to predict the failure pressure of pipelines with interacting defects, achieving results that closely matched experimental burst pressures [9]. Similarly, Arumugam et al. demonstrated the effectiveness of ANNs in predicting residual strength in pipelines subjected to internal pressure and axial compressive stress [10]. However, despite their advantages, ANN-based predictive methods often require substantial volumes of training data to achieve reliable performance, particularly in scenarios involving complex material behaviors or multi-variable interactions.

In the field of predicting failure pressure in corroded pipelines, numerous methods have been developed to address the challenge of insufficient data. When faced with limited data availability, integrating multiple methodologies can effectively maximize the utility of available data resources. For instance, Xie et al. combined data-driven models with statistical models to realize the prediction of oil and gas equipment failure rates [11]. Additionally, data reconfiguration techniques can be employed to augment the sample size when data are scarce. Canonaco et al. introduced a transfer learning approach for corrosion prediction that leverages pre-existing knowledge from similar pipelines to compensate for the lack of direct supervisory information [12]. Moreover, it is crucial to develop comprehensive resilience assessment metrics and methodologies for structure systems, as demonstrated by Cai et al., who proposed a novel resilience assessment approach based on dynamic Bayesian networks (DBNs) and Markov models [13]. This approach specifically targets structure systems such as subsea oil and gas pipelines, integrating the remaining useful life (RUL) of key components as a critical metric. Furthermore, combining empirical knowledge with data-driven models can significantly boost the predictive capability of these models. Ma et al. proposed a method that integrates empirical formulas to guide the training of artificial neural networks, thereby leveraging prior knowledge to improve model performance [14]. However, empirical formulas are often derived under specific conditions and may not fully account for all influencing factors, leading to limited robustness. Consequently, while such formulas can offer valuable insights, they may impose constraints on model training if used directly, potentially limiting their effectiveness in broader applications.

This paper proposes a simple but effective approach of integrating an empirical formula into the loss function to guide the optimization of a neural network model for predicting the failure pressure of corroded pipelines. The design of this loss function balances the instructive nature of the empirical formula and the powerful fitting capability of the neural network, thereby enhancing the model′s predictive accuracy for failure pressure. Empirical formulas are typically designed under specific corrosion defect conditions, and their predictive performance can significantly deteriorate when applied to pipelines with different defect conditions. To address this limitation, the proposed loss function incorporates an additional defect factor term predicted by the neural network. This term compensates for the errors caused by varying defect conditions, thus endowing the neural network model optimized by this loss function with stronger adaptability to a wide range of defect scenarios. Our proposed method does not involve complex feature design or selection processes. Despite this simplification, it has been validated to achieve improved performance across various publicly available datasets.

2. Methodology

2.1. Empirical Formula

To evaluate the integrity of pipelines with corrosion defects, several widely accepted criteria have been developed to predict the remaining strength and failure pressure. These include the ASME B31G, Modified B31G, PCORRC criteria, and DNV RP-F101. Each method employs different approaches and assumptions, which are outlined below.

(1): ASME B31G

The ASME B31G method is a foundational approach for assessing the remaining strength of corroded pipelines. It was formulated based on experimental burst data from real-world corroded pipelines and assumes that the corroded area has a parabolic shape for short defects (length L ≤ 20Dt) and a rectangular shape for long defects (length L > 20Dt). The failure pressure is calculated using the specified minimum yield strength (SMYS) and a geometry correction factor known as the Folias factor, which accounts for stress concentration effects [15]. The failure pressure is calculated as Equation (1).

P_{F} = \{\begin{matrix} \frac{2 t σ}{D} [\frac{1 - (2 / 3) (d / t)}{1 - (2 / 3) (d / t) / M}] & L \leq \sqrt{20 D t} \\ \frac{2 t σ}{D} (1 - \frac{d}{t}) & L > \sqrt{20 D t} \end{matrix}

(1)

where σ is the flow stress of the pipeline steel, defined as σ = 1.1 SMYS (specified minimum yield strength); t is the thickness of the pipeline; D is the outer diameter; d is the corrosion depth; and M is the Folias factor, which accounts for the stress concentration caused by radial deflection of the pipe. M can be calculated by Equation (2).

M = \sqrt{1 + 0.8 {(\frac{L}{\sqrt{D t}})}^{2}}

(2)

(2): Modified ASME B31G

Addressing the over-conservatism of the original ASME B31G, the Modified ASME B31G method adjusts the defect profile to better match actual corrosion shapes. This method replaces the parabolic factor with a more accurate 0.85 coefficient, resulting in a more realistic representation of the corroded area. The failure pressure, which can be calculated as Equation (3), is recalculated using an adjusted flow stress and a revised Folias factor that considers the defect length [16].

P_{F} = \frac{2 t σ}{D} [\frac{1 - 0.85 (d / t)}{1 - 0.85 (d / t) / M}]

(3)

where the flow stress σ = SMYS + 68.95 (MPa) and the Folias factor M is adjusted based on the defect length L, which can be calculated by Equation (4).

M = \{\begin{matrix} \sqrt{1 + 0.6275 {(\frac{L}{\sqrt{D t}})}^{2} - 0.003375 {(\frac{L}{\sqrt{D t}})}^{4}} & L \leq \sqrt{50 D t} \\ 3.3 + 0.032 {(\frac{L}{\sqrt{D t}})}^{2} & L > \sqrt{50 D t} \end{matrix}

(4)

(3): PCORRC Criteria

The PCORRC criteria, developed by Stephens, focus on predicting the remaining strength of pipelines with moderate to high toughness. This method uses finite element analysis to model elliptical defects and incorporates a plastic collapse failure mode. The failure pressure is derived from the ultimate tensile strength (UTS) and accounts for the defect length through an exponential function, providing a more nuanced prediction of residual strength [17]. The PCORRC criteria can be expressed as Equation (5).

P_{F} = \frac{2 t σ_{U T S}}{D} [1 - \frac{d}{t} (1 - \exp (- 0.157 \frac{L}{\sqrt{D (t - d) / 2}}))]

(5)

where

σ_{U T S}

is the ultimate tensile strength of the pipeline steel.

(4): DNV RP-F101

The DNV Recommended Practice (DNV RP-F101) is a comprehensive guideline for assessing the integrity of corroded pipelines. This method assumes a rectangular defect shape and calculates the failure pressure as Equation (6).

P_{F} = \frac{2 t σ_{U T S}}{D - t} [\frac{1 - d / t}{1 - d / t Q}]

(6)

where

Q

is a geometry correction factor that accounts for the defect length

L

and other geometrical parameters.

Q

can be calculated as Equation (7).

Q = \sqrt{1 + 0.31 {(\frac{L}{\sqrt{D t}})}^{2}}

(7)

2.2. Constraining Optimization by Empirical Formulas-Incorporated Loss Function

Artificial neural networks (ANNs) have demonstrated considerable accuracy in predicting the failure pressure of corroded pipelines. The integration of physical formulas to guide the optimization of neural networks was initially proposed by Raissi [18], and it has been shown to enhance the model′s ability to extract information from data, thereby improving performance in failure pressure prediction tasks [14]. However, empirical formulas often yield results with lower accuracy and are typically insufficient for directly guiding the optimization of neural networks. This limitation necessitates complex feature extraction steps to bridge the gap between empirical insights and model optimization. In this study, we introduce a novel approach to employing empirical formulas for guiding the optimization of artificial neural networks. By designing an appropriate loss function, we aim to balance the strong fitting capabilities of neural networks with the instructive nature of empirical formulas, thereby leveraging the strengths of both methodologies to achieve more accurate predictions of failure pressure in corroded pipelines.

Figure 1 illustrates the optimization process of a neural network guided by an empirical formula. On the left are pipeline data obtained from experiments or simulations, comprising seven features: outer diameter, inner diameter, yield strength, tensile strength, flaw length, flaw depth, and flaw width. These features are utilized in two parallel pathways: one inputs the data into an artificial neural network (ANN) to predict the failure pressure

{\hat{P}}_{F A}

, while the other employs the empirical formula to predict the failure pressure

{\hat{P}}_{F E}

. In Figure 1, the prediction results from the empirical formula in Constraint 1 are directly compared with the prediction results from the neural network to compute the loss. Here, Constraint 1 serves as a direct optimization constraint imposed by the empirical formula. When Constraint 2 is applied, the neural network outputs an additional term

{\hat{P}}_{D e f e c t}

, which can be interpreted as the prediction error arising from variations in flaw dimensions as captured by the empirical formula. The introduction of

{\hat{P}}_{D e f e c t}

effectively compensates for the inherent errors of the empirical formula and enhances the overall optimization effect. This additional term allows for the model to adapt more flexibly to discrepancies between the empirical formula and real-world data, thereby improving the accuracy and robustness of the optimization process.

Figure 1. Empirical formula-constrained optimization for predicting failure pressure in corroded pipelines.

3. Training Details

3.1. Dataset

The dataset employed in this study is derived from the comprehensive compilation presented in Reference [19]. This dataset is notable for its diversity, as it integrates 60 burst test results from various literature sources, corresponding to distinct defect conditions in pipelines. These conditions include different corrosion depths, lengths, and widths, reflecting a wide range of realistic scenarios encountered in the assessment of corroded pipelines. The heterogeneity of the dataset, with its varied defect geometries and material properties, poses significant challenges for traditional assessment methods. However, the method proposed in this research demonstrates robust tolerance to these diverse defect conditions. This capability is crucial for accurately predicting the failure pressure of pipelines under complex corrosion scenarios, thereby enhancing the reliability and safety of pipeline integrity assessments.

The specifications of 60 samples extracted from the relevant literature are presented in Table 1 [20,21,22,23,24,25]. The dataset was divided into training and validation sets at a ratio of 8:2. To balance the data distribution between the training and validation sets, the data samples among validation set were evenly selected from various sources.

Table 1. The details of 60 samples from various literature sources.

3.2. Training Setting

The neural network model employed in this study is a four-layer linear network, comprising one input layer, two hidden layers, and one output layer. Given that the number of data features is 7, the input channel count of the input layer is correspondingly set to 7. For the hidden layers, both the input and output channel counts are configured to be 32. Each layer of the network consists of three components: a linear module, a group normalization module [26], and a ReLU activation function. The linear module is responsible for the initial transformation of the input data, while the group normalization module helps to stabilize the training process by normalizing the data across groups of channels. Finally, the ReLU activation function introduces non-linearity into the model, enabling it to learn more complex patterns from the data. After extensive trails, the initial learning rate was determined to be 0.004. In this study, an adaptive learning rate strategy was employed to optimize the training process. Specifically, the learning rate was reduced to 80% of its current value whenever the model′s loss failed to decrease over a span of 10 consecutive iterations. This approach ensures that the learning rate is dynamically adjusted to balance the speed of convergence and the stability of the training process, thereby improving the overall performance of the model.

In our proposed method, the original data features are utilized in two parallel pathways: one involves feeding them into a neural network for forward propagation calculations, while the other involves directly inputting them into an empirical formula to predict the failure pressure. Before being fed into the neural network, the data features are normalized to a distribution with a mean of 0 and a standard deviation of 1, which helps to enhance the efficiency and accuracy of the network′s learning process. In contrast, the data features input into the empirical formula are not subjected to any preprocessing, as the formula is designed to operate directly on the raw data. The loss function used in this paper is the mean squared error (MSE) loss function, and its calculation method is shown in Equation (8).

L o s s = \frac{\sum_{i = 1}^{n} {(P_{F} - {\hat{P}}_{F A})}^{2}}{n}

(8)

The prediction based on empirical formulas holds a certain degree of rationality due to their foundation in practical observations and historical data. However, these formulas often fall short when confronted with the variability and complexity of corrosion defects. Although the previous literature has successfully applied empirical formulas to guide the optimization of neural networks, thereby enhancing model performance [27,28], directly applying empirical formulas to guide the optimization of neural networks, as illustrated in Equation (8), often proves to be counterproductive.

L o s s 1 = \frac{\sum_{i = 1}^{n} {(P_{F} - {\hat{P}}_{F A})}^{2}}{n} + \frac{\sum_{i = 1}^{n} {({\hat{P}}_{F A} - {\hat{P}}_{F E})}^{2}}{n}

(9)

where

P_{F}

is the actual failure pressure,

{\hat{P}}_{F A}

is the predicted failure pressure by ANN, and

{\hat{P}}_{F E}

is the predicted failure pressure by empirical formula. To address this limitation, this study proposes a loss function designed based on the empirical formula that integrates the guiding principles of empirical formulas with the robustness of neural networks. As demonstrated in Equation (9), this innovative loss function leverages the strengths of both methodologies to enhance the accuracy and reliability of predictions in the context of corrosion defects.

L o s s 2 = \frac{\sum_{i = 1}^{n} {(P_{F} - {\hat{P}}_{F A})}^{2}}{n} + \frac{\sum_{i = 1}^{n} {({\hat{P}}_{F A} - {\hat{P}}_{F E} - {\hat{P}}_{D e f e c t})}^{2}}{n}

(10)

where

{\hat{P}}_{D e f e c t}

is second output of the ANN, which is used for compensating the error caused by the size of defect.

3.3. Evaluation Metrics

To comprehensively evaluate the performance of the proposed method in predicting the failure pressure of pipelines with corrosion defects, four widely recognized evaluation metrics were employed: the Mean absolute percentage error (MAPE), the root mean square error (RMSE), the mean absolute error (MAE), and the coefficient of determination (R²) [29,30,31]. These metrics collectively provide a detailed assessment of the accuracy, precision, and overall goodness-of-fit of the predictions. Specifically, MAPE measures the average magnitude of the prediction errors relative to the actual values, expressed as a percentage, which can be calculated as Equation (9).

M A P E = \frac{1}{n} \sum_{i = 1}^{n} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}| \times 100 %

(11)

where

y_{i}

represents the truth failure pressure of

i_{t h}

samples and

{\hat{y}}_{i}

is the corresponding predicted result. RMSE quantifies the square root of the average squared differences between predicted and actual values, highlighting the model′s sensitivity to larger errors. RMSE is defined as Equation (10).

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(12)

MAE provides a direct measure of the average absolute difference between predictions and actual values, indicating overall prediction accuracy. MAE can be calculated as Equation (11).

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(13)

R² reflects the proportion of variance in the actual data that is explained by the predicted values, with higher values (closer to 1) indicating a better model fit, which is defined as Equation (12).

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(14)

By utilizing these metrics, the evaluation offers a balanced and robust assessment of the proposed method′s applicability and reliability in predicting pipeline failure pressures under diverse corrosion conditions.

4. Result and Discussion

In this study, we examined four classical empirical formulas, namely, ASME B31G, Modified ASME B31G, PCORRC Criteria, and DNV RP-F101, to determine the most suitable one for our dataset. Figure 2 illustrates the coefficients of determination (R²) between the predicted failure pressures and the actual failure pressures for each of these formulas on the training dataset. Among them, (a) DNV RP-F101 exhibited the best predictive performance with an R² value of 0.8844. This indicates a strong correlation between the predicted and actual failure pressures for this formula. The next best performance was observed with (c) PCORRC Criteria, which had an R² value of 0.8721, suggesting a slightly lower but still substantial correlation. The Modified ASME B31G, represented by (b), achieved an R² value of 0.8303, indicating a good but less accurate fit compared to the first two formulas. Finally, the original ASME B31G, shown in (d), had the lowest R² value of 0.6309, indicating a weaker correlation between the predicted and actual failure pressures. Based on these results, we concluded that DNV RP-F101 is the most appropriate empirical formula for our dataset due to its superior predictive accuracy.

Figure 2. Coefficient of determination between actual failure pressure and predicted failure pressure from empirical formulas.

To compare the impact of different loss functions on the optimization of neural networks, this study trained three distinct neural network models using the loss functions defined in Equations (8)–(10). Specifically, Equation (8) represents a loss function supervised by actual failure pressure, which relies solely on the observed failure pressure data to guide the training process. Equation (9) builds upon this foundation by incorporating an empirical formula-induced term, meaning that the loss is calculated directly by comparing the predicted failure pressure with the empirical formula. In contrast, Equation (10) introduces an additional term predicted by the neural network itself, which aims to compensate for the deficiencies of the empirical formula. Figure 3a presents the variation of the loss functions for each model during the training process as a function of the number of iterations. All models tend to converge around 120 epochs. However, the model trained with Equation (9) converged a final loss value that is significantly higher than those of the other two models, indicating a less efficient optimization outcome. Figure 3b illustrates the variation of the loss function during the validation process. In the initial 100 epochs, the loss function exhibited significant oscillations for all models. The model trained with Equation (8) demonstrated a relatively slow convergence rate, eventually stabilizing around the 120th epoch. In contrast, the models using Equations (9) and (10) converged more rapidly, reaching stability around the 80th epoch. Notably, the model trained with Equation (9) achieves a final loss value that is higher than those of the other two models, which is consistent with the training results.

Figure 3. The loss of different models various with iterations. (a) is the training process while (b) is the validation process.

To validate the performance of the models, this paper statistically analyzed the prediction errors of the models. Figure 4 presents the three types of statistical errors for each model in the form of bar charts with data distributions. Specifically, Figure 4a shows the absolute percentage error (APE) of the predictions, which reflects the proportion of prediction errors relative to the true values. Figure 4b illustrates the absolute error (AE), indicating the magnitude of the deviation between the predicted and actual values. Figure 4c displays the squared error (SE), which highlights samples with large prediction errors. The numbers on top of the bars represent the average values of the respective bars, with (a), (b), and (c) corresponding to the mean absolute percentage error (MAPE), mean absolute error (MAE), and mean squared error (MSE), respectively. The values in parentheses are the standard deviations. From the average error values shown in Figure 4, it can be observed that overall, the third model outperforms the first two models. The prediction errors of the first model are slightly higher than those of the third model. However, the second model has some samples with significant deviations from the actual values, which is the primary reason for its higher prediction errors compared to the first model. This is attributed to the second model′s direct use of an imprecise empirical formula to calculate the results and compute the loss, making it more sensitive to changes in corrosion defect sizes and less adaptable to various defect scenarios. In contrast, the third model employs an improved empirical formula as part of the loss function calculation, enhancing its adaptability to different defect conditions.

Figure 4. Statistical results of predictions by different models on the validation set. (a) shows the absolute percentage error of different models, (b) shows the absolute error of different models, and (c) shows the square error. The numbers above the bars represent the average error across all samples, with the corresponding values in parentheses indicating the standard deviation.

Next, this study compares the prediction results and their errors among samples across three models. Figure 5 presents the prediction results of different models on each sample in the validation set. Specifically, Figure 5a provides an intuitive comparison of the deviations between predicted and actual values by plotting the predicted values against the true values. The coefficient of determination (R²) for each model is indicated in the legend. Meanwhile, Figure 5b visually illustrates the positive and negative deviations of the prediction results for each sample. Table 2 lists the numerical deviations of prediction results for each sample across the models. From Figure 5a, it is evident that the third model achieves the highest coefficient of determination (R² = 0.9886), followed by the first model (R² = 0.9589), which is slightly higher than the second model (R² = 0.9400). Table 2 reveals that the second model exhibits significantly smaller errors in predicting the failure pressures of samples 1, 2, 4, and 9, while its predictions for samples 3 and 10 are notably larger. This trend is similar to the deviations observed when using the empirical formula DNV RP-F101 for failure pressure prediction. This suggests that the direct incorporation of empirical-formula-calculated failure pressures into the loss function (second model) may cause the model to overfit to defect scenarios that align with the empirical formula, while increasing the deviations for other cases. In contrast, the model trained with the improved loss function (third model) shows a significant improvement in overfitting. This third model exhibits stronger adaptability to various defect scenarios, thereby providing more accurate and robust predictions across the validation set.

Figure 5. Prediction results of different models on each sample in the validation set. (a) is a scatter plot of the predicted and actual values for each sample, with the legend indicating the corresponding coefficient of determination. (b) is the error between the predicted and actual values for each sample.

Table 2. The detail comparison of prediction results of different models on the validation set.

5. Conclusions

In conclusion, this study proposes an innovative method for predicting the failure pressure of corroded pipelines by integrating empirical formulas into the loss function of a neural network model. The proposed approach effectively combines the strengths of empirical formulas and artificial neural networks, leveraging the instructive nature of empirical knowledge while benefiting from the powerful fitting capabilities of ANNs. The integration of an additional defect factor term in the loss function allows for the model to compensate for errors caused by varying defect conditions, thereby enhancing its adaptability and accuracy across diverse corrosion scenarios. The results demonstrate that the proposed method outperforms traditional empirical formulas and ANN models trained with standard loss functions, achieving a mean absolute percentage error (MAPE) of 2.52%, a root mean square error (RMSE) of 0.39 MPa, and a coefficient of determination (R²) of 0.9886 on the validation set. This study highlights the importance of integrating empirical knowledge with data-driven models to improve prediction accuracy and robustness. The proposed method provides a simple yet effective solution for predicting the failure pressure of corroded pipelines, contributing to enhanced pipeline integrity assessment and safety management. Future work may focus on further optimizing the model architecture and exploring additional empirical formulas to further improve the prediction performance. However, the empirical formula used in the proposed method has a significant impact on prediction accuracy. Generally, the better the fit of the empirical formula to the data, the higher the accuracy of the model trained using the proposed method. Additionally, the proposed approach could be extended to other engineering applications where empirical formulas and data-driven models can be effectively combined to address complex prediction tasks.

Author Contributions

Conceptualization, X.M.; Methodology, H.L. and X.M.; Software, H.L.; Formal analysis, H.L.; Resources, H.L.; Writing—original draft, H.L.; Writing—review & editing, H.L.; Visualization, X.M.; Supervision, X.M.; Project administration, X.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no funding.

Data Availability Statement

The original contributions presented in this study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Hongbo Liu was employed by the company Shannxi Provincial Natural Gas Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Mousavi, S.S.; Moghaddam, A.S. Failure pressure estimation error for corroded pipeline using various revisions of ASME B31G. Eng. Fail. Anal. 2020, 109, 104284. [Google Scholar] [CrossRef]
Arumugam, T.; Karuppanan, S.; Ovinis, M. Finite element analyses of corroded pipeline with single defect subjected to internal pressure and axial compressive stress. Mar. Struct. 2020, 72, 102746. [Google Scholar] [CrossRef]
O’Grady, T.; Hisey, D.; Kiefner, J. Pressure calculation for corroded pipe developed. Oil Gas J. 1992, 90, 442. [Google Scholar]
Su, Y.; Li, J.F.; Yu, B.; Zhao, Y.L.; Yao, J. Fast and accurate prediction of failure pressure of oil and gas defective pipelines using the deep learning model. Reliab. Eng. Syst. Saf. 2021, 216, 108016. [Google Scholar] [CrossRef]
Li, M.Z.; Liu, Z.Y.; Zhao, Y.; Zhou, Y.; Huang, P.; Li, X.; Li, P.L.; Wang, X.; Zhang, D.P. Effects of corrosion defect and tensile load on injection pipe burst in CO₂ flooding. J. Hazard. Mater. 2019, 366, 65–77. [Google Scholar] [CrossRef]
Belachew, C.T.; Lsmail, M.; Karuppanan, S. Burst Strength Analysis of Corroded Pipelines by Finite Element Method. J. Appl. Sci. 2011, 1845, 1850. [Google Scholar] [CrossRef][Green Version]
Silva, R.C.C.; Guerreiro, J.N.C.; Loula, A.F.D. A study of pipe interacting corrosion defects using the FEM and neural networks. Adv. Eng. Softw. 2007, 38, 868–875. [Google Scholar] [CrossRef]
Kumar, S.D.V.; Lo, M.; Karuppanan, S.; Ovinis, P. Empirical Failure Pressure Prediction Equations for Pipelines with Longitudinal Interacting Corrosion Defects Based on Artificial Neural Network. Mar. Sci. Eng. 2022, 10, 764. [Google Scholar] [CrossRef]
Xu, W.Z.; Li, C.B.; Choung, J.; Lee, J.M. Corroded pipeline failure analysis using artificial neural network schemes. Adv. Eng. Softw. 2017, 112, 255–266. [Google Scholar] [CrossRef]
Chen, Z.; Li, X.; Wang, W.; Li, Y.; Shi, L.; Li, Y. Residual strength prediction of corroded pipelines using multilayer perceptron and modified feedforward neural network. Reliab. Eng. Syst. Saf. 2023, 231, 108980. [Google Scholar] [CrossRef]
Xie, L.; Hbrekke, S.; Liu, Y.; Lundteigen, M.A. Operational data-driven prediction for failure rates of equipment in safety instrumented systems: A case study from the oil and gas industry. J. Loss Prev. Process Ind. 2019, 60, 96–105. [Google Scholar] [CrossRef]
Canonaco, G.; Roveri MAlippi, C.; Podenzani, F.; Bennardo, A.; Conti, M.; Mancini, N. A transfer-learning approach for corrosion prediction in pipeline infrastructures. Appl. Intell. 2022, 52, 7622–7637. [Google Scholar] [CrossRef]
Cai, B.P.; Zhang, Y.P.; Yuan, X.B.; Gao, C.T.; Liu, Y.H.; Chen, G.M.; Liu, Z.K.; Ren-Jie, J.I. A Dynamic-Bayesian-Networks-Based Resilience Assessment Approach of Structure Systems: Subsea Oil and Gas Pipelines as A Case Study. China Ocean. Eng. 2020, 34, 597–607. [Google Scholar] [CrossRef]
Ma, Y.L.; Zheng, J.Q.; Liang, Y.T.; Klemeš, J.J.; Du, J.; Liao, Q.; Lu, H.F.; Wang, B.H. Deeppipe: Theory-guided neural network method for predicting burst pressure of corroded pipelines. Process Saf. Environ. Prot. 2022, 162, 595–609. [Google Scholar] [CrossRef]
Chouchaoui, B. Evaluating the Remaining Strength of Corroded Pipelines; University of Waterloo: Waterloo, Canada, 1994. [Google Scholar]
Kiefner, J.F.; Vieth, P.H. A Modified Criterion for Evaluating the Remaining Strength of Corroded Pipe; Battelle Columbus Div.: Columbus, OH, USA, 1989. [Google Scholar]
Stephens, D.R.; Leis, B.N. Development of an Alternative Criterion for Residual Strength of Corrosion Defects in Moderate to High Toughness Pipe. In Proceedings of the 2000 3rd International Pipeline Conference, American Society of Mechanical Engineers Digital Collection, Calgary, AB, Canada, 1–5 October 2000. [Google Scholar] [CrossRef]
Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
Gao, J.; Yang, P.; Li, X.; Zhou, J.; Liu, J.K. Analytical prediction of failure pressure for pipeline with long corrosion defect. Ocean. Eng. 2019, 191, 106497. [Google Scholar] [CrossRef]
Freire, J.L.F.; Vieira, R.D.; Castro, J.T.P.; Benjamin, A.C. Part 3: Burst tests of pipeline with extensive longitudinal metal loss. Exp. Tech. 2017, 30, 60–65. [Google Scholar] [CrossRef]
Mok, D.H.B.; Pick, R.J.; Glover, A.G.; Hoff, R. Bursting of line pipe with long external corrosion. Int. J. Press. Vessel. Pip. 1991, 46, 195–216. [Google Scholar] [CrossRef]
Kim, Y.; Kim, W.; Lee, Y.; Oh, K. The Evaluation of Failure Pressure for Corrosion Defects Within Girth or Seam Weld in Transmission Pipelines. In Proceedings of the 2004 International Pipeline Conference, American Society of Mechanical Engineers Digital Collection, Calgary, AB, Canada, 4–8 October 2004; pp. 1847–1855. [Google Scholar] [CrossRef]
Chen, J.X.; Li, A.Z.; Yang, Q.K.; Lin, C.; Ren, Y.C.; Jin, S.R.; Li, C.C.; Qi, M. Novel In0.49Ga0.51P/(In)GaAs/GaAs p-type modulation doped heterostructure grown by gas source molecular beam epitaxy. J. Cryst. Growth 1998, 193, 28–32. [Google Scholar] [CrossRef]
Shuai, Y.; Shuai, J.; Liu, C.Y. Research on the reliability methods of corroded pipeline. Pet. Sci. Bull. 2017, 2, 288–297. [Google Scholar]
Mannucci, G.; Demofonti, G. Fracture Properties of API X 100 Gas Pipeline Steels; Europipe: Mülheim an der Ruhr, Germany, 2002; pp. 1–128. [Google Scholar]
Wu, H.; He, K.M. Group Normalization. Int. J. Comput. Vis. 2019, 128, 742–755. [Google Scholar] [CrossRef]
Xiao, R.; Zayed, T.; Meguid, M.A.; Sushama, L. Predicting failure pressure of corroded gas pipelines: A data-driven approach using machine learning. Process Saf. Environ. Prot. 2024, 184, 1424–1441. [Google Scholar] [CrossRef]
Li, Y.; Chen, Z.F.; Wang, W.; Han, K.; Shuai, Y.; Wang, G.X. A novel assessment method for residual strength of CO₂ pipelines with multiple defects based on RF-MLP. Reliab. Eng. Syst. Saf. 2025, 261, 111088. [Google Scholar] [CrossRef]
Xu, L.; Wen, S.; Huang, H.; Tang, Y.; Wang, Y.; Pan, C. Corrosion failure prediction in natural gas pipelines using an interpretable XGBoost model: Insights and applications. Energy 2025, 325, 136157. [Google Scholar] [CrossRef]
Lu, H.; Peng, H.; Xu, Z.D.; Qin, G.; Azimi, M.; Matthews, J.C.; Cao, L. Theory and Machine Learning Modeling for Burst Pressure Estimation of Pipeline with Multipoint Corrosion. J. Pipeline Syst. Eng. Pract. 2023, 14, 1481. [Google Scholar] [CrossRef]
Xu, L.; Wang, Y.; Mo, L.; Tang, Y.; Wang, F.; Li, C. The research progress and prospect of data mining methods on corrosion prediction of oil and gas pipelines. Eng. Fail. Anal. 2023, 144, 106951. [Google Scholar] [CrossRef]

Figure 1. Empirical formula-constrained optimization for predicting failure pressure in corroded pipelines.

Figure 2. Coefficient of determination between actual failure pressure and predicted failure pressure from empirical formulas.

Figure 3. The loss of different models various with iterations. (a) is the training process while (b) is the validation process.

Figure 4. Statistical results of predictions by different models on the validation set. (a) shows the absolute percentage error of different models, (b) shows the absolute error of different models, and (c) shows the square error. The numbers above the bars represent the average error across all samples, with the corresponding values in parentheses indicating the standard deviation.

Figure 5. Prediction results of different models on each sample in the validation set. (a) is a scatter plot of the predicted and actual values for each sample, with the legend indicating the corresponding coefficient of determination. (b) is the error between the predicted and actual values for each sample.

Table 1. The details of 60 samples from various literature sources.

Literature	Materials	P_F	D	t	L	H	W	σ	σ_UTS
(Freire et al., 2006) [20]	X80	22.68	458.8	8.1	39.6	5.39	31.9	601	684
	X80	24.2	459.4	8	40.05	3.75	32	589	730.5
	X60	14.4	323.9	9.8	255.6	7.08	95.3	452	542
	X60	14.07	323.9	9.66	305.6	6.76	95.3	452	542
	X60	13.58	323.9	9.71	350	6.93	95.3	452	542
	X60	12.84	323.9	9.71	394.5	6.91	95.3	452	542
	X60	12.13	323.9	9.91	433.4	7.31	95.3	452	542
	X60	11.92	323.9	9.74	466.7	7.02	95.3	452	542
	X60	11.91	323.9	9.79	488.7	6.99	95.3	452	542
	X60	11.99	323.9	9.79	500	6.99	95.3	452	542
	X60	11.3	323.9	9.74	527.8	7.14	95.3	452	542
	X60	14.6	508	14.6	500	10.35	97	478	600
	X60	13.4	508	14.3	500	10.3	97	478	600
	X60	15.8	508	14.8	500	9.7	97	478	600
	X46	9.4	76.2	2	75	1.4	16	391	458
	A25	5.45	76.2	2.04	75	1.44	16	260	309
(Mok et al., 1991) [21]	X60	11.25	508	6.6	381	2.62	25.4	540	610.3
	X60	8	508	6.35	900	3.43	25.4	540	610.3
	X60	11.8	508	6.35	900	2.16	25.4	540	610.3
	X60	8.4	508	6.35	1000	3.18	25.4	540	610.3
	X60	11.55	508	6.7	1016	2.66	25.4	540	610.3
(Kim et al., 2008) [22]	X65	27.5	762	17.5	50	8.75	50	495	565
	X65	24.3	762	17.5	100	8.75	50	495	565
	X65	21.8	762	17.5	200	8.75	50	495	565
	X65	19.8	762	17.5	300	8.75	50	495	565
	X65	16.5	762	17.5	600	8.75	50	495	565
	X65	15	762	17.5	900	8.75	50	495	565
	X65	24.11	762	17.5	200	4.2	50	474.1	556.6
	X65	21.76	762	17.5	200	8.9	50	474.1	556.6
	X65	17.15	762	17.5	200	13.1	50	474.1	556.6
	X65	24.3	762	17.5	100	8.4	50	474.1	556.6
	X65	19.8	762	17.5	300	8.5	50	474.1	556.6
	X65	23.42	762	17.5	200	8.4	100	474.1	556.6
	X65	22.64	762	17.5	200	9	200	474.1	556.6
(Chen et al., 1998) [23]	20F	10.8	426	6.95	160	2.7	25	240	390
	20F	9.81	426	7	150	3.8	21	240	390
	20F	7.85	426	7	150	5.2	25	240	390
	20	8.83	529	9	350	4.7	25	285	415
	20	15.7	529	9	160	4.7	25	285	415
	20	14.2	529	9	150	5.3	25	285	415
	X60	10.3	720	8	180	4.3	25	425	535
	X60	8.83	720	8	320	4.4	26	425	535
	X60	7.55	720	8	180	6.2	26	425	535
(Shuai et al., 2017b) [24]	-	15.36	304.8	6.35	26	4.95	20	351	543
	-	16.29	304.8	6.35	33	4.25	21	382	570
	-	14.29	304.8	6.35	37	4.64	30	351	463
	-	16.22	324	6.01	19.35	3.6	19	382	570
	-	23.2	324	10.3	243	5.15	154.5	380	514
	-	22	324	10.3	243	5.15	30.9	380	514
	-	11.25	508	6.6	381	2.62	25.4	443.4	598.9
	-	8	508	6.35	900	3.43	25.4	429.6	572.5
	-	8.4	508	6.35	1000	3.18	25.4	434.8	572.5
	-	11.55	508	6.7	1016	2.66	25.4	430	601
	-	14.4	323.9	9.8	255.6	6.95	95.3	422.5	589.6
	-	13.58	323.9	9.71	350	6.85	95.3	422.5	589.6
	-	12.19	323.9	9.91	433.4	7.08	95.3	422.5	589.6
(Mannucci and Demofonti, 2002) [25]	X100	15.35	1422.4	19.25	180	10.4	0.5	740	774
	X100	20.12	1422.4	20.1	385	3.8	0.5	795	840
	X100	21.4	914.4	16.4	150	9	0.5	739	813
	X100	24.02	914.4	16.4	450	6	0.5	739	813

Table 2. The detail comparison of prediction results of different models on the validation set.

Number	P_F	DNV RP-F101		NN with Equation (8)		NN with Equation (9)		NN with Equation (10)
Number	P_F	${\hat{P}}_{F E}$	R_e (%)	${\hat{P}}_{F A}$	R_e (%)	${\hat{P}}_{F A}$	R_e (%)	${\hat{P}}_{F A}$	R_e (%)
1	22.68	21.98	−3.09	23.22	+2.37	22.64	−0.17	22.42	−1.15
2	12.84	11.72	−8.72	12.46	−2.97	12.89	+0.41	12.14	−5.42
3	16.22	20.67	+27.44	15.79	−2.65	18.92	+16.64	15.85	−2.28
4	19.80	18.59	−6.11	19.75	−0.24	19.90	+0.50	19.56	−1.22
5	15.80	15.41	−2.47	14.22	−10.01	13.92	−11.89	15.22	−3.69
6	12.19	12.70	+4.18	12.26	+0.54	12.70	+4.14	12.20	+0.05
7	10.80	9.92	+8.15	11.44	+5.93	10.33	−4.35	10.64	−1.51
8	11.25	10.62	+5.60	11.15	−0.88	10.21	−9.24	10.94	−2.75
9	13.40	12.32	+8.06	14.05	+4.83	13.54	+1.03	14.39	+7.37
10	10.30	8.23	+20.10	8.76	−14.95	8.61	−16.45	10.16	−1.33
11	22.64	20.18	+10.87	22.30	−1.51	22.01	−2.79	22.00	−2.82
12	21.40	24.51	+14.53	22.37	+4.53	21.73	+1.55	22.03	+2.96
13	11.25	10.82	+3.82	13.11	+16.54	11.49	+2.12	11.23	−0.22

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Prediction of Corroded Pipeline Failure Pressure Based on Empirical Knowledge and Machine Learning

Abstract

1. Introduction

2. Methodology

2.1. Empirical Formula

2.2. Constraining Optimization by Empirical Formulas-Incorporated Loss Function

3. Training Details

3.1. Dataset

3.2. Training Setting

3.3. Evaluation Metrics

4. Result and Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics