5.1. Case Background and Data Symmetry Analysis
The Wenchang oilfields are located in the South China Sea, in the east of Hainan Province. The location of the oilfield is shown in
Figure 4. The SP76-EP76 section of pipeline in the Wenchang oil fields was put into use in June 2008, and basic information on the pipeline is shown in
Table 1. The diameter of the inner pipe is 200 mm, the wall thickness of the inner pipe is 12.7 mm, and the corrosion allowance is 8.2 mm. The pipeline wall thicknesses of the corrosion area over 10 years are shown in
Table 2.
The difference in wall thickness between adjacent years was taken as the corrosion rate. Data from the first eight years were used for gray prediction, and the last two data are used for accuracy verification. The corrosion depth data (minimum residual wall thickness) were obtained from in-line inspection (ILI) runs using an MFL tool. Inspections were conducted annually over a 10-year period (2008–2017). The same inspection tool and calibration procedures were used each year to ensure consistency. Corrosion depths were measured at the same 12 predefined critical sections along the SP76-EP76 pipeline segment during each inspection. These sections were identified during the initial baseline survey as areas prone to internal corrosion due to flow regime and historical data. The raw MFL signal data were processed using the vendor’s standard software (version number: V3.2) to extract the minimum remaining wall thickness at each location. The annual maximum corrosion depth value used for modeling (
Table 2) was the maximum value observed among all 12 sections each year. The pipeline transported multiphase oil-gas fluid. The operating temperature range was 70–76 °C, and the operating pressure range was 2.4–2.58 MPa. Fluid composition analysis showed CO
2 content of 2.5 mol% and presence of trace H
2S (<50 ppm). These conditions were relatively stable throughout the monitoring period.
First, we analyzed the symmetry of the corrosion rate data:
(1) The average corrosion rate μ = 3.139 mm, and the standard deviation σ = 1.562 mm.
(2) The symmetry coefficient of the original data sequence S = 0.08 < 0.1, indicating good symmetry.
(3) The residual error of the unbiased GM (1,1) model prediction follows a normal distribution with the mean value as the symmetry axis, and the symmetry coefficient Se = 0.06 < 0.1, which conforms to the symmetry characteristics of the statistical distribution.
The overall trend of data in
Table 2 travels upward, which is in line with the preliminary characteristics of data that can be used for gray prediction processing. The change in the corrosion depth of the pipeline inner wall follows a first-order linear equation with a single variable, conforming to the gray GM (1,1). The internal corrosion depth changes over time, and the influencing factors are complex and diverse, and many of them are their own dynamic changes. It is almost impossible to accurately quantify these factors. Therefore, from the perspective of gray theory, the internal corrosion depth is a gray parameter containing known and unknown information with gray characteristics and can be used in gray analysis.
5.2. Model Prediction Results with Symmetry Integration
5.2.1. Gray Prediction and Unbiased Gray Prediction of Internal Corrosion Rate
All calculations for the GM (1,1) model, including level ratio checks, accumulation, and parameter estimation via least squares, were performed using MATLAB R2021a. Based on the data in
Table 2, the initial nonnegative data sequence
x(0) was established:
The smoothness of the initial data was checked, and the results are shown in
Table 3.
Because (
e−2/n+1,
e2/n+1) = (0.8007, 1.2488), the smoothness values of corrosion data were all in the required range. Therefore, the initial data series
x(0) was accumulated and formed into sequence
x(1):
The exponential characteristics of
x(1) were checked, and the values are shown in
Table 4.
The checked values showed that the exponential law was basically satisfied, and the GM (1,1) model could be used for prediction. Therefore, the nearest mean generating sequence of
x(1) was established as follows:
The gray differential equation was established as follows:
and
where
a = −0.1711 is the development, and
u = 1.6724 is the coordination coefficient. Furthermore, the prediction function was
The predicted values for each year according to this function are shown in
Table 5.
After symmetry-based data preprocessing, the development coefficient a = −0.1711, coordination coefficient u = 1.6724, and the symmetry of the parameters were verified to meet the constraints. The 9th-year corrosion rate predicted by the GM (1,1) model was 3.727 mm, with a relative error of 49.6%. After symmetry correction of the unbiased GM (1,1) model, the 9th-year predicted value was 1.522 mm, with a relative error of 10.2%, which was significantly improved compared with the traditional model. This is because symmetry correction eliminates the asymmetric deviation of the model and makes the prediction result more in line with actual corrosion rate fluctuation rules.
5.2.2. Unbiased GM (1,1) Prediction
The unbiased GM (1,1) model is proposed to eliminate the gray deviation and further improve the prediction accuracy. The difference between unbiased GM (1,1) model and traditional GM (1,1) model is the parameter calculation method:
The development coefficient
a = −0.1715, and the coordination coefficient
u = 1.8289. Thus, the unbiased GM (1,1) prediction model was obtained as follows:
And the predicted values of each year are shown in
Table 6.
The corrosion rate in the ninth year calculated via unbiased gray prediction is 1.522 mm, which is greatly improved compared with the gray prediction, and is closer to the actual value of 0.744 mm. The prediction results of GM (1,1) model and unbiased GM (1,1) model are as shown in
Figure 5.
Figure 5 shows that for the prediction of internal corrosion, the traditional GM (1,1) model had large errors when fitted to the historical data, with the maximum relative deviation of 75.23% and the average deviation of 43.91%. In comparison, the prediction results of the unbiased GM (1,1) model have greatly improved; the maximum relative deviation is 17.79% and the average deviation is 6.68%. However, there are still large deviations for long-term prediction results. Therefore, it is necessary to further optimize the prediction model with a Markov chain.
5.2.3. Gray Markov Chain Prediction with Symmetry of State Transition
The Markov chain model is used for predictions based on the current state and the trend of change in the future of variables. If the value range of a system variable changes from one state to another, the system undergoes a state transition [
33]. The key to how the Markov chain model predicts a future state is by determining the state transition probability matrix and then calculating the future state according with to present moment’s state [
34].
According to the Chinese national pipeline standard SY/T6151-2009 [
35], the corrosion state of an oil and gas pipeline is qualitatively determined by the maximum pitting depth of the pipe wall. The pipe wall corrosion can be divided into three states, as shown in
Table 7.
The range of residual error between unbiased GM (1,1) and initial value is e(i) = [−0.385, 0]. According to the symmetry principle, the corrosion state can be divided into three symmetric intervals around the average corrosion rate:
State 1 (mild corrosion): [−0.0385, 0];
State 2 (moderate corrosion): [−0.3082, −0.0385];
State 3 (severe corrosion): [−0.3854, −0.3082].
Furthermore, the corrosion state of pipeline predicted by unbiased GM (1,1) is classified in
Table 8.
According to the corrosion mechanism of pipelines, the high-level state can only be transferred from the low-level state, so the transfer probability of each state is calculated as follows:
P12 = P(c1 → c2) = P(c2 | c1) = 1;
P21 = P(c2 → c1) = P(c1 | c2) = 0.2;
P22 = P(c2 → c2) = P(c2 | c2) = 0.6;
P23 = P(c2 → c3) = P(c3 | c2) = 0.2;
The transfer matrix
P of corrosion change is obtained as follows:
Assuming the initial state of the pipe wall starts from the data of the first year, the initial corrosion state
c0 =
c1 = [0.25, 0.625, 0.125]. After transferring to the
n-th year, the transfer matrix of the process is
P(n) =
Pn, and the corrosion state is
cn =
c0Pn. According to the state interval, the median value of each interval is
m1 = −0.0193;
m2 = −0.1734; and
m3 = −0.3468. Because the probability of a state transferring to the next state in the residual information of gray prediction is obtained from matrix
P, the expression of the unbiased gray Markov chain prediction model is as follows:
where
rj(
i) is the row vector in
P,
j = 1,2, …,
q, and
q is the number of rows in the transition matrix. Finally, the predicted values of each year are shown in
Table 9:
The corrosion rate predicted by the unbiased gray Markov chain model is 1.351 mm in the 9th year, which is greatly improved compared with the unbiased gray prediction, and is closer to the actual value. The prediction results of the unbiased gray Markov chain prediction model are compared with those in the previous section, as shown in
Figure 6. It shows that the accuracy of the unbiased gray predictions are further improved by the Markov chain. The maximum accuracy relative deviation is 7.77%, the minimum accuracy relative deviation is 0.53%, and the average accuracy relative deviation is increased by 2.61%. This is due to consideration of the symmetry of state transition, which makes the model more in line with the dynamic equilibrium of the corrosion system.
5.3. Gray Markov Chain Prediction of Internal Corrosion Rate Based on PSO
For the three states of residual sequence
e(
i), the residual prediction value usually is the intermediate value of the state residual interval. However, in actual applications, the intermediate value is not necessarily the best choice, and the best result may be associated with a certain value in the state interval. According to established knowledge, the state interval is also a gray interval with uncertain results; a method to whiten the gray interval is as follows:
where
Dij and
Uij are the respective gray interval boundaries of each state, and
λ is the whitening coefficient, λ ∈ [0, 1]. In order to obtain the optimal residual value in the state residual sequence, it is necessary to find a method with a simple process and strong global search ability to calculate the optimal whitening coefficient
λ. Through research and analysis, it was found that PSO has the characteristics of a simple process, fewer parameters, and strong global search ability [
36,
37]. Therefore, the PSO algorithm was used to find the optimal whitening coefficient value.
The specific role of PSO is to automate and optimize the selection of λ. Instead of relying on a fixed rule (like the interval midpoint), PSO treats the selection as an optimization problem. It initializes a population (swarm) of potential solutions (particles), each representing a set of three λ values (λ
1, λ
2, and λ
3). The algorithm then iteratively improves these solutions by moving particles through the search space [0, 1]
3, guided by their own best-known position and the swarm’s best-known position, with the explicit goal of minimizing the fitness function.
The state boundaries were determined by equal-frequency binning of the residual errors from the unbiased GM (1,1) model fit to the first 8 years of data, aiming for a similar number of data points per state where possible, adjusted slightly to create meaningful intervals based on the residual distribution. The one-step transition probability matrix P was calculated directly from the state sequence of the residuals (
Table 8) by counting transitions. For example, P12 = 1 because the only occurrence of State 1 (year 1) was followed by State 2 (year 2). The PSO computation was performed on a desktop computer with an Intel Core i7-10700K CPU @ 3.80 GHz and 32 GB RAM. The average runtime for a single PSO execution (300 iterations, 500 particles) was approximately 15 s. The standard PSO algorithm with catfish effect modification (Equation (21)) was implemented. The PSO algorithm was coded in Python 3.8 using numpy for numerical computations. We set the particle length as 3; the number of particles as 500; the number of iterations as 300; the inertia weight coefficient
w = 0.9 − 0.5
k/
K, where
k is the current iteration number and K is the maximum iteration number; the learning factor
c1 =
c2 = 2; deviation threshold
ebg = 0,
ebp = 0.01; and the initial value of the whitening coefficient
λ as a random number within [0, 1]. In addition, the fitness function was the Mean Squared Error (MSE) of the prediction residuals for the first 8 years of data, and the fitness function used to measure particles was defined based on the residual error:
The optimal whitening coefficients obtained by PSO were
λ1 = 0.8352,
λ2 = 0.9773, and
λ3 = 0.9289. And further calculations obtained the following values:
,
, and
. The expression of the prediction model is as follows:
Finally, the predicted values for each year are shown in
Table 10:
The PSO algorithm with symmetry constraints is used to optimize the whitening coefficient. The optimal whitening coefficients obtained are
λ1 = 0.8352,
λ2 = 0.9773, and
λ3 = 0.9289, which are symmetrically distributed around 0.5. The corrosion rate predicted is 1.395 mm in the ninth year, which is much better than that predicted by the unbiased gray Markov chain, and it is closer to the actual value of 0.744 mm. The prediction results are shown in
Figure 7a,b. The figures show that the PSO Markov chain further improved the accuracy of the unbiased gray Markov chain predictions. The quantitative improvements in accuracy are summarized in
Table 11, which clearly demonstrates the superiority of the PSO-optimized model across all measured metrics.
As illustrated in
Table 11 and the figures, the PSO Markov chain further improved the accuracy of unbiased gray Markov chain prediction. The maximum accuracy relative range was 13.34%, the minimum was 0.93%, the average accuracy relative range was increased by 4.51%, and the fitting performance was closer to the actual value.
5.4. Sensitivity and Robustness Analysis
To comprehensively evaluate the stability and reliability of the proposed GM-Markov-PSO model under realistic conditions, a sensitivity and robustness analysis was conducted. This analysis aimed to demonstrate the model’s performance in the presence of parameter perturbations and measurement errors, which are inevitable in practical engineering applications.
5.4.1. Sensitivity to Initial Model Parameters
The GM (1,1) model’s performance was influenced by its initial parameters, notably the development coefficient a and the coordination coefficient u. To test sensitivity, we introduced perturbations of ±5% and ±10% to optimally derived values (a = −0.1711, u = 1.6724). The resulting changes in the prediction for the 9th-year corrosion rate were calculated.
The results, summarized in
Table 12, indicate that a 10% perturbation in the development coefficient a led to a change in the predicted value of approximately 6.2%. A similar perturbation in the coordination coefficient u resulted in a change of about 4.8%. These relatively moderate changes in output compared to the input perturbations suggest that the model is not hyper-sensitive to its core parameters within a reasonable range of variation. The PSO optimization process contributes to this stability by identifying robust parameter sets.
5.4.2. Robustness to Measurement Noise (Data Uncertainty)
The robustness of the model against measurement errors—a critical aspect of real-world data symmetry violations—was evaluated by introducing artificial Gaussian noise into the original maximum corrosion depth sequence. Noise levels with means of zero and standard deviations (σ) of 2% and 5% of the data mean were added to simulate measurement inaccuracies. The entire GM-Markov-PSO modeling process was then repeated 100 times for each noise level to generate a distribution of predictions for the 9th-year corrosion rate.
Research findings indicate that under a 5% noise level, the predicted values for the 9th year have a mean of 1.428 mm with a standard deviation of 0.087 mm. The narrow distribution of predictions around the baseline value (1.395 mm) demonstrates the model’s strong robustness. The Markov chain component effectively corrects random fluctuations, while the PSO-optimized parameters help maintain prediction stability, ensuring the model’s output is not drastically altered by small, symmetric data disturbances.
5.4.3. Robustness of State Transition Probabilities
The Markov chain’s state transition probability matrix is derived from historical data. To test the sensitivity of the model to this matrix, we generated alternative matrices by randomly perturbing each probability within a ±0.1 range while ensuring row sums remained equal to 1. Using 100 such perturbed matrices, the prediction process was repeated.
The analysis revealed that the final predicted corrosion rate for the 9th year varied within a range of ±3.5% around the baseline prediction. This indicates that the model’s performance is not critically dependent on the precise values of the transition probabilities, further affirming its robustness. The model’s ability to yield consistent results despite uncertainties in the state transition parameters underscores its suitability for handling the stochastic nature of corrosion processes.
The sensitivity and robustness analyses confirm that the proposed GM-Markov-PSO model exhibits strong stability against parameter perturbations and measurement errors. The integration of the Markov chain corrects random data fluctuations, and the PSO algorithm identifies parameter sets that are less sensitive to noise. This resilience to input uncertainties, combined with the model’s inherent ability to handle small sample sizes, makes it a highly reliable and practical tool for corrosion rate prediction in oil and gas pipelines, where data quality and operational conditions can introduce significant variability.
5.5. Discussion
The choice of the classical GM (1,1) model as a primary benchmark for comparison is fundamental to this study’s objective. As the cornerstone of gray system theory, the GM (1,1) model provides a clear baseline against which the incremental contributions of our proposed enhancements can be rigorously quantified. The substantial prediction errors observed with the standard GM (1,1) model—a maximum relative deviation of 75.23%—conclusively demonstrate its limitations in handling the fluctuating and non-linear nature of pipeline corrosion data. This initial finding validates the necessity of methodological improvements. The sequential presentation of results, progressing from GM (1,1) to the unbiased GM (1,1), then to the Markov-chain-corrected model, and finally to the PSO-optimized version, serves to delineate the specific contribution of each added component: the unbiased correction addresses systematic deviation, the Markov chain captures stochastic volatility, and the PSO algorithm optimizes key parameters. Therefore, this comparative framework not only establishes a performance baseline but also constructs a compelling argument for the hybrid GM-Markov-PSO model as a necessary evolution beyond the foundational gray model for achieving predictive accuracy in complex engineering systems like corrosion assessment.
The traditional GM (1,1) model produced large errors when fitted to the historical data; in comparison, the prediction results from the unbiased GM (1,1) model were greatly improved, but they still showed large deviations for long-term prediction results. The fitting degree of the unbiased gray Markov chain model with the original data was better than that of unbiased GM (1,1) model. When the original data fluctuated greatly, the unbiased GM (1,1) model ignored the randomness of the original data, while the unbiased gray Markov chain model gave full play to the advantages of the unbiased gray prediction model and the Markov chain model by taking into account the influences of the change in trend and relative fluctuations on the prediction results. At the same time, the unbiased gray Markov chain model also considers the influence of various random factors on the system’s state transition, fully exploits the information provided by historical data, divides the state, and determines the transition probability matrix in the prediction process, which improves the accuracy of the prediction results.
However, the unbiased gray Markov chain model still has some defects. The model uses the intermediate value of the state interval in its calculations, but in practical problems, the intermediate value is not necessarily the best selection result, and the best result may be associated with a certain value in the state interval. Therefore, in order to find the best interval position, this paper used a swarm intelligence algorithm to optimize the unbiased gray Markov chain model.
The PSO Markov chain further improved the accuracy of the prediction results, bringing them closer to the actual values. This proves that the middle value of the residual error interval is not necessarily the optimal value, and it is necessary to optimize the whitening coefficient by means of the PSO algorithm, which can further improve the prediction accuracy of the model.
The integration of symmetry theory significantly improves the prediction accuracy of the model. The main reasons are as follows:
(1) Symmetry-based data preprocessing eliminates asymmetric outliers, enhancing the regularity of the data sequence and laying a foundation for accurate modeling.
(2) The symmetry constraints of the GM (1,1) model parameters ensure that the model conforms to the inherent order of the corrosion system, avoiding deviations caused by ignoring the symmetry of factor distribution and process evolution.
(3) The symmetry of Markov chain state transitions makes the state division and transition probability more reasonable, reflecting the dynamic equilibrium of the corrosion system.
(4) The symmetry constraints of the PSO algorithm improve the global search ability and stability of the algorithm, ensuring that the optimized whitening coefficient is the optimal solution in the symmetric solution space.
The proposed GM-Markov-PSO model demonstrates considerable robustness to environmental fluctuations, which underpins its generalizability. Its primary input is the historical time series of the maximum corrosion depth, which implicitly encapsulates the complex, non-linear effects of underlying environmental drivers (e.g., temperature, pressure, and fluid composition). Consequently, the model’s performance is not contingent on the real-time monitoring of specific environmental variables; the Markov chain effectively models state transitions based on observed outcomes that inherently reflect the integrated impact of all conditions, while symmetry constraints enhance stability by filtering asymmetric noise from short-term perturbations. However, this data-driven approach assumes a degree of stationarity in the corrosion process. A significant regime shift, such as a change in corrosive species or the application of a new inhibitor, would necessitate model recalibration with new inspection data—a process facilitated by the integrated PSO algorithm. The model’s core methodology constitutes a general framework for small-sample forecasting of cumulative degradation, making it readily applicable to other infrastructure systems (e.g., storage tanks and structural components) where scarce data on a degradation metric (e.g., crack length and wall loss) is available, thereby showcasing significant potential as a versatile tool for asset integrity management across engineering domains.
While the proposed GM-Markov-PSO model demonstrates significant advantages in predicting internal corrosion rates with small sample sizes, several promising directions emerge for future research to enhance its robustness and applicability further. First, the current Markov chain model relies on an empirically defined state division. Future work could focus on developing adaptive state partitioning methods based on data-driven algorithms and symmetry principles. Such methods would dynamically determine the optimal number of states, reducing subjectivity and improving the model’s ability to handle diverse corrosion data patterns. Second, although the PSO algorithm effectively optimized the whitening coefficients, the algorithm itself can be enhanced. Research into a symmetry-constrained PSO variant, featuring symmetric particle initialization and velocity update mechanisms, could improve convergence speed and global search capability, leading to more accurate and stable parameter optimization. Finally, to address the challenge of medium- to long-term prediction accuracy, integrating time-varying symmetry characteristics is crucial. Establishing a sliding transition matrix within the Markov chain that continuously updates by incorporating new inspection data while phasing out older data could allow the model to adapt to evolving corrosion trends, thereby improving its performance over extended prediction horizons. These research pathways would not only address the current model’s limitations but also extend its applicability to a wider range of corrosion prediction scenarios in engineering practice.