3.2.1. Multi-Parameter Fusion Location Model of Gas Extraction Pipeline Leakage
- (1)
Discussion on the positioning accuracy of a single pressure data input
The localization accuracy of each model in the pressure training set is shown in
Table 6. Under a single pressure input, the prediction performance differed among models. The LSSVM achieved an average accuracy of 0.799, indicating its ability to capture nonlinear relationships between pressure variation and leakage location. The ENN obtained a higher accuracy of 0.820, suggesting that its recurrent structure was more suitable for describing temporal pressure response characteristics. The DBN showed relatively unstable performance, with an average accuracy of 0.760, possibly due to the limited sample size and insufficient pressure-only features. Among the Stacking models, the S-L-E, S-L-D, and S-E-D achieved average accuracies of 0.819, 0.819, and 0.827, respectively, generally outperforming most single models.
Under pressure input, the S-E-D model achieved the best performance, with an average accuracy of 0.827, mainly because the combination of the ENN and DBN enhanced the extraction of temporal and nonlinear pressure response features. However, the S-L-E-D model achieved only 0.528, suggesting that more base learners do not necessarily lead to better performance. Overall, pressure data contain useful leakage location information, but a single pressure input remains insufficient for robust localization, further highlighting the necessity of pressure–flow collaborative input. As shown in
Table 6, under a single pressure input, the training time of the single models was relatively short, with the LSSVM, ENN, and DBN requiring 3.82 s, 8.65 s, and 15.43 s, respectively. The Stacking models required a longer training time because multiple base learners and a meta-learner were involved. Among them, the S-E-D achieved the highest average accuracy of 0.827, with a training time of 27.64 s, indicating that the improvement in positioning accuracy was accompanied by an increase in offline computational cost.
The evaluation indicators of each model after preliminary selection with a single pressure data input are shown in
Figure 6. The fold number, RMSE, and MAPE jointly evaluated the model reliability, absolute error, and relative error. These indicators provided a more comprehensive assessment of model accuracy, stability, and generalization performance. The average values of the MAPE and RMSE for all positioning methods exhibited slight fluctuations, with the RMSE ranging from 0.111 to 0.116 and the MAPE varying between 0.181 and 0.203. Under a single pressure data input, the S-L-E model achieved the best stability, with the CV values of the RMSE and MAPE being 3.83% and 6.02%, respectively. The S-E-D model showed slightly larger fluctuations, with an RMSE-CV and MAPE-CV of 6.77% and 12.46%, respectively. However, the S-L-E exhibited the best comprehensive accuracy, with the lowest average RMSE (0.1110) and average MAPE (0.1810), making it the overall optimal method.
- (2)
Discussion on the positioning accuracy of a single flow data input
The localization accuracy of each model in the flow training set is shown in
Table 7. The localization models showed different accuracy levels on the flow training set. Among the single models, the ENN achieved the highest accuracy of 0.820, outperforming the LSSVM (0.798) and DBN (0.763), indicating that its structure was more suitable for capturing the temporal characteristics of flow data. Among all models, the S-L-E-D performed best, with fold accuracies ranging from 0.813 to 0.843 and an average accuracy of 0.830. This improvement was mainly attributed to the complementary learning and feature fusion ability of the Stacking ensemble model.
The S-L-E-D model integrated the outputs of three foundational models (LSSVM, ENN, DBN), with the three models having inherently distinct modeling logic. When leakage occurred, the pipeline flow rate presented a temporal process of stability, sudden change, attenuation and new stability. Under a single flow data input, the S-L-E-D model exhibited the smallest fluctuation among all models. The RMSE-CV and MAPE-CV values were 5.94% and 8.31%, respectively, indicating strong robustness and generalization ability under different flow data distributions. The accuracy of the fourth fold dropped sharply to 0.676, which was a direct reflection of the insufficient generalization ability caused by overfitting. The local fitting advantage of the LSSVM compensated for the neglect of small-sample details by the DBN. The temporal capture capability of the ENN remedied the inadequacy of the LSSVM in the modeling of dynamic processes. This conclusion was consistent with the finding in the reference [
48] that the integration of multiple models could reduce the positioning error through feature complementarity.
As shown in
Table 7, the training time of each model under a single flow input was close to that under a single pressure input. The training time of the LSSVM was only 3.64 s, while that of the optimal S-L-E-D model increased to 31.57 s. Although the S-L-E-D had the longest training time, it achieved the highest average positioning accuracy of 0.830. This indicates that the Stacking ensemble structure improved the localization performance by increasing the offline computational complexity, and the additional training cost was acceptable for model construction.
The evaluation indicators of each model after preliminary selection with a single flow data input are shown in
Figure 7. The overall pattern showed that the positioning accuracy of the Stacking ensemble model outperformed the single models, and the combinations containing the ENN among the fusion models performed more prominently. The S-L-E-D model had an RMSE stably below 0.11 and MAPE close to 0.15 across all five folds, which indicated that it could maintain high accuracy and strong generalization ability under different scenarios of flow rate data distribution.
- (3)
Discussion on the positioning accuracy of pressure–flow collaborative data input
The localization accuracy of each individual localization model in the pressure–flow collaborative training set is shown in
Table 8. Compared with a single pressure/flow input, the accuracy of all models was significantly improved under collaborative input. The advantage of the S-L-E-D model was further amplified; the optimal model S-L-E-D under collaborative input reached an accuracy of 0.932, representing an increase of more than 10%. Furthermore, under a single dataset input, the accuracy improvement in the ensemble models over the single models was approximately 3% to 5%, while this gap widened to 5% to 8% under collaborative input. The pressure–flow rate dataset provided richer features, enabling the defect complementary and feature fusion mechanisms of ensemble models to exert their effects to a greater extent. Under pressure–flow collaborative input, the S-L-E-D model achieved the optimal stability, with an RMSE-CV of 2.41% and MAPE-CV of 4.26%. These results further demonstrated that single pressure or flow data may be affected by sensor errors and environmental interference, whereas pressure–flow collaborative input provides information from two different monitoring parameters. The collaborative input enabled the models to acquire two-dimensional features of spatial and temporal attributes, avoiding the limitation that single-source data could only capture single-dimensional information.
As shown in
Table 8, the training time increased slightly under pressure–flow collaborative input due to the higher feature dimension. The S-L-E-D model required the longest training time of 36.59 s but achieved the highest average accuracy of 0.932. Although the Stacking models had higher offline training costs, the optimal S-L-E-D model had an average inference time of only 0.021 s per sample, shorter than the 150 ms sampling interval, indicating its potential for real-time leakage localization.
The evaluation indicator of each model after preliminary selection with pressure–flow collaborative data input is shown in
Figure 8. The optimal model was the S-L-E-D model, with an RMSE below 0.09 and MAPE below 0.10 across all folds, and it exhibited the smallest fluctuation across folds. Compared with a single pressure or flow rate input, the RMSE and MAPE of all models decreased significantly under collaborative input, and the fluctuation of errors across folds was generally reduced. Therefore, collaborative data input significantly reduced the absolute deviation in the model in predicting the leakage position.
To further evaluate whether the performance improvement in the S-L-E-D model was statistically significant, the uncertainty of the five-fold cross-validation results was analyzed. The mean accuracy, standard deviation, and 95% confidence interval were calculated for each model. In addition, paired
t-tests were conducted between the S-L-E-D model and other models under pressure–flow collaborative input, and Cohen’s d was used to evaluate the effect size. A significance level of
p < 0.05 was used. Statistical comparisons of the models under pressure–flow collaborative input are shown in
Table 9.
As shown in
Table 9, the S-L-E-D model achieved the highest mean accuracy of 0.932, with a 95% confidence interval of 0.926–0.938. Compared with the other models under pressure–flow collaborative input, the mean accuracy improvement in the S-L-E-D ranged from 0.037 to 0.058. The paired
t-test results showed that these differences were statistically significant at the
p < 0.05 level. In addition, the effect sizes were relatively large, indicating that the improvement in the S-L-E-D model was not only reflected in the mean value but also supported by fold-wise statistical comparison.
In summary, relying on a single flow or pressure signal for leakage localization is easily affected by normal pipeline fluctuations, leading to reduced positioning accuracy. In contrast, pressure–flow collaborative analysis can capture the spatiotemporal relationship between upstream flow variation and pressure attenuation near the leakage point. The mutual verification of these two parameters helps reduce interference and misjudgment from single-signal monitoring, thereby improving the accuracy and reliability of leakage localization.
- (4)
Determine the optimal localization model
After preliminary model selection, the methods with average five-fold cross-validation accuracy above 0.80, RMSE ≤ 0.12, and MAPE ≤ 0.20 were retained. The TIC values of different leakage points are shown in
Table 10. Among them, L2 showed the highest TIC values under most input and model combinations. For example, under pressure input, the TIC values of the S-L-D and S-E-D reached 0.201 and 0.218, respectively, while under pressure–flow collaborative input, the TIC of the DBN reached 0.248. This may be related to the location of L2 in the middle of the pipeline, where turbulent flow, vortex flow, and unstable gas transmission increased pressure–flow signal fluctuations, making stable feature mapping more difficult. In contrast, L3 showed the lowest TIC values across different scenarios. Under pressure–flow collaborative input, the TIC values of the S-L-E-D and LSSVM were only 0.015 and 0.020, respectively. This is mainly because L3 was located in the downstream section with a relatively regular pipeline structure and stable signal transmission, allowing the models to extract leakage features more consistently.
The evaluation indicators of each optimized localization model are shown in
Figure 9. L1 showed the most significant improvement under collaborative input, with the MAPE reduced by 38.7%. L2 exhibited the highest RMSE and MAPE among the five leakage points. Under a single pressure input, the RMSE of the S-L-D was 2.8 times that of L3, and under collaborative input, the RMSE of the S-L-E-D was still 2.1 times that of L3. This indicates that L2 remained more difficult to locate due to its complex flow characteristics. Nevertheless, collaborative input significantly reduced the overall errors, with the RMSE and MAPE decreasing by 39.0% and 37.4%, respectively, compared with a single pressure input. The average RMSE across all leakage points of the DBN, the worst model under collaborative input, was 0.082, which was still lower than the average RMSE of 0.091 of the S-E-D. Overall, the S-L-E-D is preferred under collaborative input, especially for high-accuracy scenarios.
The MAE and distance error of each leakage point is presented in
Table 11. For the S-L-E-D positioning method, the MAE of each fold was found to be lower than 0.1, and the distance error was found to be less than 1 m. It was verified that the S-L-E-D model was the most applicable localization model in this paper.
In practical underground gas extraction systems, monitoring points are usually spaced hundreds of meters apart. Therefore, the distance error of 0.814 m achieved by the S-L-E-D model indicates its potential for narrowing the leakage area and improving maintenance efficiency. Since pressure and flow sensors are commonly used, the pressure–flow collaborative framework also shows potential for online leakage warning. However, this result was obtained from a laboratory-scale pipeline, and practical deployment still requires further consideration of sensor calibration, data transmission delay, computational cost, long-term sensor stability, and underground disturbances such as gas fluctuation, dust, moisture, vibration, temperature variation, pipeline attenuation, and sensor drift. In addition, its transferability to pipelines with different diameters, layouts, lengths, roughness, and network structures needs further verification. Although the S-L-E-D performed best under the current validation framework, future work should use independent test sets, nested cross-validation, cost–benefit analysis, and field-scale validation before practical application.
3.2.2. Analysis of the Number of Monitoring Points and Leakage Localization Performance
To explore the relationship between the number of input layer nodes and the fitting effect of the integration localization model, the leakage positioning accuracy under three forms of data model input was investigated in this section. To clarify the monitoring point failure simulation protocol, sensor failure in this study was defined as complete loss of the pressure and flow signals at the corresponding monitoring point. The pressure and flow features of this point were removed from the input feature vector. The leakage localization model was then evaluated using the remaining available monitoring point data. Single-point fault scenarios and two-point fault scenarios were considered to analyze the influence of missing monitoring information on localization performance. This fault setting mainly represents complete sensor outage or communication interruption. The discussion was divided into three groups: (1) A fault occurred at any one point; (2) faults occurred at any two points; and (3) no sensor faults occurred.
The localization accuracy of the validation set under different monitoring point fault conditions is shown in
Figure 10. Single faults at M1, M4, and M5 had relatively small effects, with localization accuracies of 0.884, 0.891, and 0.881, respectively, whereas faults at M2 and M3 caused greater accuracy loss. Under dual-fault conditions, the M1 and M4 combination achieved the highest accuracy of 0.861, followed by M1 and M5, M2 and M4, and M2 and M5, with accuracies of 0.858, 0.857, and 0.855, respectively. These results indicate that faults at key monitoring points on the main pipeline have a stronger influence on leakage localization.
To further evaluate the statistical significance and robustness of the sensor fault analysis, the localization results under different monitoring point fault conditions were statistically analyzed. For each fault scenario, the mean positioning accuracy, standard deviation, and 95% confidence interval were calculated based on repeated validation results. The 95% confidence interval was used to describe the uncertainty range of the positioning accuracy, while the Coefficient of Variation was introduced to evaluate the stability of the model under sensor fault conditions.
The statistical robustness results are shown in
Table 12. Under normal monitoring conditions, the model achieved a mean accuracy of 0.932, with a standard deviation of 0.005 and a narrow 95% confidence interval of 0.926–0.938, indicating high stability. Under single-point fault conditions, the mean accuracy remained between 0.881 and 0.891, with Coefficients of Variation below 1.50%, showing stable performance when one sensor failed. Under dual-point fault conditions, the mean accuracy decreased to 0.855–0.861, and the standard deviation slightly increased, indicating greater localization uncertainty. Nevertheless, the confidence intervals remained narrow and the Coefficients of Variation were below 2.00%, demonstrating good robustness and fault tolerance of the proposed pressure–flow collaborative Stacking model.
As shown in
Figure 11, different monitoring point faults caused distinct error distributions among leakage points. When M1 failed, the positioning errors were widely distributed, indicating uneven effects on different leakage points. The failure of M4 caused large accuracy fluctuations, especially for L5, whose errors were mainly concentrated in medium-error intervals. When M5 failed, L5 showed the poorest accuracy, with 73.5% of errors falling within the 5–10% interval, while L1 maintained the best performance. Under the joint failure of M1 and M4, double-fault error superposition was observed, reducing the proportion of low-error intervals, especially for L1. Overall, monitoring point faults increased localization uncertainty, and the influence varied with both sensor position and leakage location.
In summary, the effect of monitoring point failure on leakage localization showed clear spatial dependence. Failures at middle monitoring points had greater influence than endpoint failures because they provided more critical spatial gradient information. In multi-point fault scenarios, scattered failures caused less accuracy degradation than adjacent failures, as more spatial information could be retained. In contrast, concentrated failures led to local information loss and weakened the model’s inference ability. Therefore, sensors in branch pipelines and middle pipeline sections should be prioritized during field deployment and maintenance, and a spatially uniform monitoring layout is recommended to improve information complementarity and localization robustness under partial sensor failure.
It should be noted that the present fault simulation only considered the complete loss of monitoring point data. In actual underground gas extraction systems, sensor faults may also appear as signal drift, abnormal noise, intermittent packet loss, calibration error, delayed response, or partial measurement distortion. Different fault modes may have different effects on pressure–flow feature distribution and leakage localization accuracy. Therefore, future studies will further introduce more realistic sensor fault modes and compare their influence on the robustness of the proposed model.