# Application Research on Risk Assessment of Municipal Pipeline Network Based on Random Forest Machine Learning Algorithm

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Risk Factors of Pipeline

## 3. Principle of the Random Forest Method

## 4. Application of Random Forest Method in Municipal Pipeline Network: Pipeline Network in Suzhou Industrial Park

#### 4.1. Data Preprocessing

- (1)
- Implement efficient word graph scanning based on the prefix dictionary, and a directed acyclic graph (DAG) composed of all possible word formations characters is generated in one sentence;
- (2)
- Using dynamic programming to find the maximum probability path, and find the maximum segmentation combination based on word frequency;
- (3)
- For words not previously entered, HMM model based on word forming ability is adopted, and the Viterbi algorithm is used for calculation;
- (4)
- Part-of-speech tagging based on the Viterbi algorithm;
- (5)
- Extract keywords based on tf-idf and text rank models.

#### 4.2. Balancing Data and Exploratory Analysis

- For each sample x in the minority class, calculate the distance between the point and other sample points in the minority class, and get the nearest k neighbors (that is, perform the KNN algorithm on the minority class points);
- Set a sampling ratio according to the sample imbalance ratio to determine the sampling magnification. For each minority class sample x, randomly select several samples from its k nearest neighbors, assuming that the selected neighbor is x′;
- For each randomly selected neighbor x′, construct a new sample with the original sample according to the following formula:

#### 4.3. Optimization of the Random Forest Model

#### 4.4. Comparison of Optimized Random Forest Model with Other Models

#### 4.5. Calculation Results of Random Forest Risk Assessment

#### 4.5.1. Univariate Analysis

#### 4.5.2. Feature Interaction Analysis

#### 4.5.3. Feature Importance and Decision Impact

#### 4.5.4. Feature Contribution and Decision Paths

## 5. Conclusions

- (1)
- Among several machine learning algorithms, SVM, Naive Bayes, and Decision Trees do not fit the training set as well as Logistic Regression and Random Forest. The accuracy rate of logistic regression and the random forest is significantly higher than that of the support vector machine and decision tree, and for minority sample logistic regression, the recall rate of random forest is also significantly higher than decision trees and support vector machines;
- (2)
- The comparison of the area under the ROC curve shows that the AUC value of the random forest is the largest (0.82), followed by logistic regression (0.79), support vector machine (0.74), and decision tree (0.53). It can be seen that the random forest of the integrated learning algorithm performs best on this data set;
- (3)
- Among the pipes, steel pipes have the greatest influence. Compared with ductile iron and PE pipes, steel pipes have lower corrosion resistance, so they are suitable for a small amount of application in non-critical parts. The previous failure (NOPF) has the greatest impact on the damage to the pipe section. It can be understood that the secondary damage probability of the pipe section is very high, and the damaged and repaired pipe is likely to cause stress concentration and damage again under uneven pressure drops;
- (4)
- Through further analysis of the municipal pipe network risk assessment model under the established random forest algorithm, the influence of every single variable and pairwise interactive variables on the damage of pipe sections is given; through Feature Importance, Permutation Importance) to measure the importance of each feature in the data set, and the quantitative relationship between the feature and the final prediction result is given by the feature contribution (SHAP). Taking a single sample as an example, the decision-making process of the data is shown.

## Supplementary Materials

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Zhou, X.; Tang, Z.; Xu, W.; Meng, F.; Chu, X.; Xin, K.; Fu, G. Deep learning identifies accurate burst locations in water distribution networks. Water Res.
**2019**, 166, 115058. [Google Scholar] [CrossRef] [PubMed] - Zahedi, P.; Parvandeh, S.; Asgharpour, A.; McLaury, B.S.; Shirazi, S.A.; McKinney, B.A. Random forest regression prediction of solid particle Erosion in elbows. Powder Technol.
**2018**, 338, 983–992. [Google Scholar] [CrossRef] - Shojaie, E.F.; Darihaki, F.; Shirazi, S.A. A method to determine the uncertainties of solid particle erosion measurements utilizing machine learning techniques. Wear
**2023**, 522, 204688. [Google Scholar] [CrossRef] - Pudar, R.S.; Liggett, J.A. Leaks in pipe networks. J. Hydraul. Eng.
**1992**, 118, 1031–1046. [Google Scholar] [CrossRef] - Romano, M.; Kapelan, Z.; Savić, D.A. Geostatistical techniques for approximate location of pipe burst events in water distribution systems. J. Hydroinformatics
**2013**, 15, 634–651. [Google Scholar] [CrossRef] - Bakker, M.; Vreeburg, J.H.G.; Van De Roer, M.; Rietveld, L.C. Heuristic burst detection method using flow and pressure measurements. J. Hydroinformatics
**2014**, 16, 1194–1209. [Google Scholar] [CrossRef] - Tao, T.; Yan, H.X.; Xin, K.L.; Qu, L.F. Pipe burst location analysis based on SCADA system pressure monitoring. Water Technol.
**2016**, 10, 11–14. [Google Scholar] - Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI
**1995**, 14, 1137–1145. [Google Scholar] - Rodriguez, J.D.; Perez, A.; Lozano, J.A. Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Trans. Pattern Anal. Mach. Intell.
**2009**, 32, 569–575. [Google Scholar] [CrossRef] - Wong, T.T. Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recognit.
**2015**, 48, 2839–2846. [Google Scholar] [CrossRef] - Kang, J.; Park, Y.J.; Lee, J.; Wang, S.H.; Eom, D.S. Novel leakage detection by ensemble CNN-SVM and graph-based localization in water distribution systems. IEEE Trans. Ind. Electron.
**2017**, 65, 4279–4289. [Google Scholar] [CrossRef] - Hodge, V.J.; O’Keefe, S.; Weeks, M.; Moulds, A. Wireless sensor networks for condition monitoring in the railway industry: A survey. IEEE Trans. Intell. Transp. Syst.
**2014**, 16, 1088–1106. [Google Scholar] [CrossRef] - Zhou, B.; Dong, Y. Identification and Analysis of Drainage Network Safety Operation Based on Online Monitoring Data of Distributed Senso. Archit. Eng.
**2019**, 25, 1–6. [Google Scholar] - Hu, X.; Han, Y.; Yu, B.; Geng, Z.; Fan, J. Novel leakage detection and water loss management of urban water supply network using multiscale neural networks. J. Clean. Prod.
**2020**, 278, 123611. [Google Scholar] [CrossRef] - Colombo, A.F.; Lee, P.; Karney, B.W. A selective literature review of transient-based leak detection methods. J. Hydro-Environ. Res.
**2009**, 2, 212–227. [Google Scholar] [CrossRef] - Wu, Y.; Liu, S. A review of data-driven approaches for burst detection in water distribution systems. Urban Water J.
**2017**, 14, 972–983. [Google Scholar] [CrossRef] - Li, R.; Huang, H.; Xin, K.; Tao, T. A review of methods for burst/leakage detection and location in water distribution systems. Water Sci. Technol. Water Supply
**2015**, 15, 429–441. [Google Scholar] [CrossRef] - Page, P.R.; Zulu, S.; Mothetha, M.L. Remote real-time pressure control via a variable speed pump in a specific water distribution system. J. Water Supply Res. Technol.—AQUA
**2019**, 68, 20–28. [Google Scholar] [CrossRef] - Creaco, E.; Campisano, A.; Fontana, N.; Marini, G.; Page, P.R.; Walski, T. Real time control of water distribution networks: A state-of-the-art review. Water Res.
**2019**, 161, 517–530. [Google Scholar] [CrossRef] - Wu, S.; Gao, Q.; Chen, Y.; Chen, J.; Zhang, J. Discussion on Mining and Application of Pipeline Big-Data. Urban Geotech. Investig. Surv.
**2017**, 06, 50–52. [Google Scholar] - Li, J.; Liu, W.; Wei, S. Seismic Topology Optimization of Life line Networks Based on seismic reliability. J. Catastrophology
**2010**, 25, 4–9. [Google Scholar] - Shi, F.; Peng, X.; Liu, Z.; Li, E.; Hu, Y. A data-driven approach for pipe deformation prediction based on soil properties and weather conditions. Sustain. Cities Soc.
**2020**, 55, 5102012. [Google Scholar] [CrossRef] - Luo, Z.S.; Zhao, L.X.; Wang, X.W. Failure model for pitting fatigue damaged pipeline of subsea based on dynamic Bayesian network. Surf. Technol.
**2020**, 49, 269. [Google Scholar] - Tang, K.; Parsons, D.J.; Jude, S. Comparison of automatic and guided learning for Bayesian networks to analyse pipe failures in the water distribution system. Reliab. Eng. Syst. Saf.
**2019**, 186, 24–36. [Google Scholar] [CrossRef] - Mounce, S.R.; Day, A.J.; Wood, A.S.; Khan, A.; Widdop, P.D.; Machell, J. A neural network approach to burst detection. Water Sci. Technol.
**2002**, 45, 237–246. [Google Scholar] [CrossRef] [PubMed] - Mounce, S.R.; Khan, A.; Wood, A.S.; Day, A.J.; Widdop, P.D.; Machell, J. Sensor-fusion of hydraulic data for burst detection and location in a treated water distribution system. Inf. Fusion
**2003**, 4, 217–229. [Google Scholar] [CrossRef] - Mounce, S.R.; Boxall, J.B.; Machell, J. Development and verification of an online artificial intelligence system for detection of bursts and other abnormal flows. J. Water Resour. Plan. Manag.
**2010**, 136, 309–318. [Google Scholar] [CrossRef] - Yang, H.; Li, M.; Yu, G. Water quality model of municipal network based on artificial neural network. Water Wastewater Eng.
**2012**, 48, 471–475. [Google Scholar] - Christodoulou, S.; Deligianni, A.; Aslani, P.; Agathokleous, A. Risk-based asset management of water piping networks using neurofuzzy systems. Comput. Environ. Urban Syst.
**2009**, 33, 138–149. [Google Scholar] [CrossRef] - Yamijala, S.; Guikema, S.D.; Brumbelow, K. Statistical models for the analysis of water distribution system pipe break data. Reliab. Eng. Syst. Saf.
**2009**, 94, 282–293. [Google Scholar] [CrossRef] - Chen, J.; Liu, L.; Pei, J.; Deng, M. An ensemble risk assessment model for urban rainstorm disasters based on random forest and deep belief nets: A case study of Nanjing, China. Nat. Hazards
**2021**, 107, 2671–2692. [Google Scholar] [CrossRef] - Liu, J.; Zio, E. Integration of feature vector selection and support vector machine for classification of imbalanced data. Appl. Soft Comput.
**2019**, 75, 702–711. [Google Scholar] [CrossRef] - Pu, Y.; Apel, D.B.; Xu, H. Rockburst prediction in kimberlite with unsupervised learning method and support vector classifier. Tunn. Undergr. Space Technol.
**2019**, 90, 12–18. [Google Scholar] [CrossRef] - Zheng, W. Research status and trends of municipal pipe network optimization technology research. Urban Constr. Theory Res.
**2017**, 3, 236–237. [Google Scholar] - Fares, H.; Zayed, T. Hierarchical Fuzzy Expert System for Risk of Failure of Water Mains. J. Pipeline Syst. Eng. Pract.
**2010**, 1, 53–62. [Google Scholar] [CrossRef] - Kabir, G.; Tesfamariam, S.; Francisque, A.; Sadiq, R. Evaluating risk of water mains failure using a Bayesian belief network model. Eur. J. Oper. Res.
**2015**, 240, 220–234. [Google Scholar] [CrossRef] - Sattar, A.M.A.; Gharabaghi, B.; McBean, E.A. Prediction of Timing of Watermain Failure Using Gene Expression Models. Water Resour. Manag.
**2016**, 30, 1635–1651. [Google Scholar] [CrossRef] - Sattar, A.M.A.; Ertuğrul, F.; Gharabaghi, B.; McBean, E.A.; Cao, J. Extreme learning machine model for water network management. Neural Comput. Appl.
**2019**, 31, 157–169. [Google Scholar] [CrossRef] - De Oliveira, D.P.; Garrett, J.H., Jr.; Soibelman, L. A density-based spatial clustering approach for defining local indicators of drinking water distribution pipe breakage. Adv. Eng. Inform.
**2011**, 25, 380–389. [Google Scholar] [CrossRef] - Shirzad, A.; Tabesh, M.; Farmani, R. A comparison between performance of support vector regression and artificial neural network in prediction of pipe burst rate in water distribution networks. KSCE J. Civ. Eng.
**2014**, 18, 941–948. [Google Scholar] [CrossRef] - Jafar, R.; Shahrour, I.; Juran, I. Application of Artificial Neural Networks (ANN) to model the failure of urban water mains. Math. Comput. Model.
**2010**, 51, 1170–1180. [Google Scholar] [CrossRef] - Ren, Y. The application of case teaching method for “python and application” under the concept of curriculum ideology and politics. Open Access Libr. J.
**2022**, 9, 1–7. [Google Scholar] [CrossRef] - Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res.
**2002**, 16, 321–357. [Google Scholar] [CrossRef] - Morgenthaler, S. Exploratory Data Analysis. Wiley Interdiscip. Rev. Comput Stat.
**2009**, 1, 33–44. [Google Scholar] [CrossRef] - Lyu, Z.; Yu, Y.; Samali, B.; Rashidi, M.; Mohammadi, M.; Nguyen, T.N.; Nguyen, A. Back-propagation neural network optimized by K-fold cross-validation for prediction of torsional strength of reinforced Concrete beam. Materials
**2022**, 15, 1477. [Google Scholar] [CrossRef] - Vu, H.L.; Ng KT, W.; Richter, A.; An, C. Analysis of input set characteristics and variances on k-fold cross validation for a Recurrent Neural Network model on waste disposal rate estimation. J. Environ. Manag.
**2022**, 311, 114869. [Google Scholar] [CrossRef]

**Figure 3.**Frequency diagram of pipe diameter and pipe age distribution. (

**a**) Pipe diameter; and (

**b**) pipe age.

Type | Factor | English Description |
---|---|---|

Physical factors | Pipe age | - |

Material | Pipe material | |

Pipe diameter | - | |

Pipe length | - | |

Pipe pressure | - | |

Flow | Measuring point flow | |

NOPF | Number of previous failures | |

Environmental factors | Soil type | Soil type |

Regional Environment | Area type | |

Temperature | Temperature | |

Operational factors | Type of damage | Failure type |

Transportation | Traffic | |

Other | Others |

Precision | Recall | Fl-Score | Support | ||
---|---|---|---|---|---|

Decision trees | 0 | 0.72 | 0.67 | 0.7 | 43 |

1 | 0.33 | 0.49 | 0.36 | 18 | |

Accuracy | 0.59 | 61 | |||

Macro avg | 0.53 | 0.53 | 0.53 | 61 | |

Weighted avg | 0.61 | 0.59 | 0.6 | 61 | |

Logistic regression | 0 | 0.77 | 0.95 | 0.85 | 43 |

1 | 0.75 | 0.53 | 0.46 | 18 | |

Accuracy | 0.77 | 61 | |||

Macro avg | 0.76 | 0.64 | 0.66 | 61 | |

Weighted avg | 0.77 | 0.77 | 0.74 | 61 | |

Support vector machines | 0 | 0.72 | 1.00 | 0.83 | 43 |

1 | 1.00 | 0.46 | 0.11 | 18 | |

Accuracy | 0.72 | 61 | |||

Macro avg | 0.86 | 0.53 | 0.47 | 61 | |

Weighted avg | 0.80 | 0.72 | 0.62 | 61 | |

Random forest | 0 | 0.78 | 0.93 | 0.85 | 43 |

1 | 0.70 | 0.59 | 0.50 | 18 | |

Accuracy | 0.77 | 61 | |||

Macro avg | 0.74 | 0.66 | 0.68 | 61 | |

Weighted avg | 0.76 | 0.77 | 0.75 | 61 |

Weights | Feature | Weights | Feature |
---|---|---|---|

0.3753 | NOPF_More | 0.0022 | Pre_drop_No |

0.1581 | Mat_Steel Pipe | 0.0009 | Mat_Ductile Iron |

0.1543 | pH | 0.0001 | Pipe pressure |

0.1166 | Pipe age | 0.0001 | Flow |

0.0750 | Pre_drop_Yes | 0.0001 | NOPF_No |

0.0425 | Pipe diameter | 0.0001 | Mat_PE |

0.0319 | Pipe length | ||

0.0282 | NOPF_1 | ||

0.0150 | Mat_Other |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Cen, H.; Huang, D.; Liu, Q.; Zong, Z.; Tang, A.
Application Research on Risk Assessment of Municipal Pipeline Network Based on Random Forest Machine Learning Algorithm. *Water* **2023**, *15*, 1964.
https://doi.org/10.3390/w15101964

**AMA Style**

Cen H, Huang D, Liu Q, Zong Z, Tang A.
Application Research on Risk Assessment of Municipal Pipeline Network Based on Random Forest Machine Learning Algorithm. *Water*. 2023; 15(10):1964.
https://doi.org/10.3390/w15101964

**Chicago/Turabian Style**

Cen, Hang, Delong Huang, Qiang Liu, Zhongling Zong, and Aiping Tang.
2023. "Application Research on Risk Assessment of Municipal Pipeline Network Based on Random Forest Machine Learning Algorithm" *Water* 15, no. 10: 1964.
https://doi.org/10.3390/w15101964