# Statistical Modeling of Water Shortage in Water Distribution Systems in Guangzhou

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Data Processing and Data Analysis

- Data processing: The address data of water shortage records were collected from the customer center and subjected to data cleansing. Addresses were matched with a standard address database to ensure a consistent and unified format. Subsequently, the addresses were converted into corresponding spatial coordinates.
- Data classification: Zones were divided based on the topological structure of the water distribution system. Water shortage records were classified into their respective zones according to the division results. The pressure and flow data from water plants in the SCADA system corresponded to water shortage records. Through the above steps, the data classification of each zone was completed.
- Probability model establishment: For a specific zone under consideration, statistical analysis was conducted on pressure and flow data, with a focus on the occurrences of water shortage events. This analysis aimed at establishing a probability model that quantified the likelihood of future water shortage events based on pressure and flow characteristics within that specific zone.

#### 2.1. Data Processing

#### 2.1.1. Data Collection

^{2}, with a total pipeline length of 5681 km. This extensive network serves a population of approximately 16 million people and represents approximately 2.5% of China’s total daily water supply. The GZ Company has a customer service center for handling customers’ complaints. This center receives many complaints related to insufficient water pressure, ranging from dozens to hundreds each day. The dispatchers and operators investigate these complaints. Based on their analysis, they may make necessary adjustments to satisfy customers’ demands.

#### 2.1.2. Data Standardization and Visualization

- Define the dictionary: Create a vocabulary database containing the administrative divisions of China (e.g., `dict[]={“Guangzhou”, “Yuexiu District”, “Nonglinxia Road”}').
- Read the text to be segmented: Read the Chinese text string to be segmented character by character.
- Initialize the pointer: Set a pointer that initially points to the beginning of the string to be segmented.
- Start matching: Begin matching the longest word at each step. Follow these specific steps:
- Extract a segment of the text to the right of the pointer as the string to be matched.
- Compare the string to be matched with the longest word in the dictionary. If there is a match, consider that word as a segmentation result and move the pointer to the end position of that string.
- If there is no match, reduce the length of the string to be matched by one character and try to match it with the dictionary again. Repeat this process until a match is found.

- Continue matching: Repeat step 4 until the pointer points to the end of the string to be segmented.
- Output the result: Output each matched word as a segmentation result.

Algorithm 1. Algorithm to Forwards Maximum Match. | ||||

- The coordinate of the water shortage record $t$ was $\left({u}_{1},{u}_{2}\right)$, and the water shortage radius was ${r}_{1}={r}_{2}={r}_{i}$. The range of the calculation buffer zone was defined as ${x}_{1}=\left[{u}_{1}-{r}_{i},{u}_{1}+{r}_{i}\right]$ and ${x}_{2}=\left[{u}_{2}-{r}_{i},{u}_{2}+{r}_{i}\right]$. The water shortage intensity ${f}_{t}\left({x}_{1},{x}_{2},{u}_{1},{u}_{2},{\mathrm{r}}_{i}\right)$ within the calculation buffer zone was represented as the grayscale ${v}_{{x}_{1},{x}_{2}}^{i,t}$.
- A progressive grayscale band was defined with a range of 255 pixels. By calculating the transparency $I$ the calculation buffer zone was filled with the grayscale [24].$${I}_{{x}_{1},{x}_{2}}^{i,t}=\frac{255\left({v}_{{x}_{1},{x}_{2}}^{i,t}-{v}_{min}^{i}\right)}{{v}_{max}^{i}-{v}_{min}^{i}}$$
- The grayscale values of each calculation buffer zone were superimposed. When multiple zones overlapped within the calculation buffer zone, the superimposed grayscale value was increased, resulting in a brighter appearance in the corresponding color [25].$${\mathsf{\alpha}}_{{x}_{1},{x}_{2}}^{i}={\mathsf{\alpha}}_{{x}_{1},{x}_{2}}^{i,{t}_{1}}+{\mathsf{\alpha}}_{{x}_{1},{x}_{2}}^{i,{t}_{2}}+\cdots +{\mathsf{\alpha}}_{{x}_{1},{x}_{2}}^{i,{t}_{n}}$$

#### 2.1.3. Data Classification

- Ray Method [26]: This method involves projecting a ray from a reference point in a standardized direction. The determination is made based on the parity of the number of intersection points between the ray and the boundaries of the zone.
- Turning Angle Method [27]: This method follows the counterclockwise order of vertices along the boundary of the zone polygon. It entails calculating whether positive or negative angles are formed by connecting each vertex with the reference point.
- Angle Sum Method [28]: This method requires calculating all angles formed between each boundary of the zone polygon and the reference point. If their sum equals 360°, then it implies that the reference point lies within that specific zone.
- Area Sum Method [29]: In this method, all triangles’ areas formed by connecting the reference point with the boundaries of the zone polygon are calculated. If their sum is equal to that specific zone’s area, then it indicates that the reference point lies within that particular area.

Algorithm 2. Algorithm to filtering points inside a polygon. | ||||

Algorithm 3. isRayIntersectsSegment (point, p1, p2). | ||

#### 2.2. Data Analysis

#### 2.2.1. Marking of Water Shortage Records

^{2}) can be expressed as a linear function. The water pump station increases the pressure to satisfy the customer demand. In this study, the daily water pressure (H) and flow rate (Q, which represents the water demand) were normalized based on the annual average pressure (H

_{0}) and flow rate (Q

_{0}). These normalized data were plotted with coordinates H/H

_{0}− (Q/Q

_{0})

^{2}, as shown in Figure 6. The data points corresponding to water shortage events are marked in the plot. If one or more water shortages were recorded on a specific day, the corresponding data point was marked with a green circle. Otherwise, it was plotted as a black triangulation point.

_{0}) and flow rate ((Q/Q

_{0})

^{2}) at the Xizhou water plant. The green circles indicate water shortages in the YC zone, and the red line represents the fitting line for the black triangulation points. Several patterns can be observed in Figure 6. (1) The square of the water demand shows a linear relationship with the water pressure. (2) A lower-bound flow rate was observed before the occurrence of shortage events, approximately 95% of the annual average flow rate. This is indicated by the blue line in Figure 6.

#### 2.2.2. Methodology of Statistical Characteristics

- There exists a critical pressure range at which customers begin to complain about water shortages. When the water pressure satisfies the requirements of water facilities, the probability of a water shortage is low. When the pressure is lower than that in water facilities, customers complain based on the service pressure.
- When the water demand exceeds the critical flow rate, the water pump station increases the pressure to improve customers’ experience and minimize complaints. The water company adjusts the water supply pressure within a reasonable range to maintain the balance between economic costs and customers’ complaints.

#### 2.2.3. Water Shortage Distribution Characteristics

_{0}and A

_{2}in the Boltzmann function were close to zero, whereas A

_{1}was close to one. By setting the Δh

_{0}and A

_{2}values to 0 and the A

_{1}value to 1, the activation Sigmoid function widely used in neural networks was obtained, as expressed by Equation (7). The parameters and fitted Sigmoid function lines are presented in Table 1 and Figure 9, respectively:

## 3. Results and Discussion

## 4. Research Limitations and Suggestions for Further Research

#### 4.1. Research Limitations

_{0})

^{2}> 0.91 (approximately 95% of the annual average flow). Obtaining data for such specific regions to analyze is not easy because most cities do not experience water scarcity to the extent that the GZ city does. Therefore, not every city can undertake similar studies.

#### 4.2. Suggestions for Further Research

- Reference for operational adjustments of water distribution systems: When a certain number of water shortage events occur, water pump adjustments are required. Water companies can adjust their service pressures based on the water shortage probability function.
- Water distribution system construction: If a new pipeline is constructed in a water supply system, the local pressure in the pipeline network increases. Based on the hydraulic model of the pipeline network, the improvement of pressure can be calculated, and a statistical model can be used to estimate the reduction in water shortage probability. The model can also be used to predict the critical water demand during water shortages after a pipeline is constructed.

## 5. Conclusions

- The heat map can visualize the data of the water shortage records and help determine the spatial location and intensity of water shortages. However, it cannot inform water companies when a water shortage occurs or state how much improvement in water service pressure can reduce customer complaints.
- To satisfy customer requirements, the water company adjusts the pressure according to water demand. The service pressure (H) and water demand (Q
^{2}) of the water pump station exhibit a linear relationship. Water shortage events are related to the water demand. Customers complain when the water demand exceeds the critical values of different zones. The critical pressure line for water shortage can be fitted to water shortage samples. The cumulative probability of the samples indicates that water shortage events follow the pattern of the artificial neural function. The critical parameters of the model reflected the requirements for water facility service pressures.

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Mortula, M.M.; Ali, T.A.; Sadiq, R.; Idris, A.; Al Mulla, A. Impacts of Water Quality on the Spatiotemporal Susceptibility of Water Distribution Systems. Clean–Soil Air Water
**2019**, 47, 1800247. [Google Scholar] [CrossRef] - Cabrera, E.; Gómez, E.; Cabrera, E.; Soriano, J.; Espert, V. Energy Assessment of Pressurized Water Systems. J. Water Resour. Plann. Manag.
**2015**, 141, 04014095. [Google Scholar] [CrossRef] - Hashemi, S.; Filion, Y.R.; Speight, V.L. Pipe-Level Energy Metrics for Energy Assessment in Water Distribution Networks. Procedia Eng.
**2015**, 119, 139–147. [Google Scholar] [CrossRef] - Cabrera, E.; Cabrera, E.; Cobacho, R.; Soriano, J. Towards an Energy Labelling of Pressurized Water Networks. Procedia Eng.
**2014**, 70, 209–217. [Google Scholar] [CrossRef] - Goel, D.; Chaudhury, S.; Ghosh, H. Smart Water Management: An Ontology-Driven Context-Aware IoT Application. In Pattern Recognition and Machine Intelligence; Shankar, B.U., Ghosh, K., Mandal, D.P., Ray, S.S., Zhang, D., Pal, S.K., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2017; Volume 10597, pp. 639–646. [Google Scholar] [CrossRef]
- Jeong, G.; Wicaksono, A.; Kang, D. Revisiting the Resilience Index for Water Distribution Networks. J. Water Resour. Plan. Manag.
**2017**, 143, 04017035. [Google Scholar] [CrossRef] - Atkinson, S.; Farmani, R.; Memon, F.A.; Butler, D. Reliability Indicators for Water Distribution System Design: Comparison. J. Water Resour. Plan. Manag.
**2014**, 140, 160–168. [Google Scholar] [CrossRef] - Greco, R.; Di Nardo, A.; Santonastaso, G. Resilience and Entropy as Indices of Robustness of Water Distribution Networks. J. Hydroinform.
**2012**, 14, 761–771. [Google Scholar] [CrossRef] - Raad, D.N.; Sinske, A.N.; van Vuuren, J.H. Comparison of Four Reliability Surrogate Measures for Water Distribution Systems Design: Comparison of WDS Reliability Surrogates. Water Resour. Res.
**2010**, 46, W05524. [Google Scholar] [CrossRef] - Jun, H.; Loganathan, G.V.; Kim, J.H.; Park, S. Identifying Pipes and Valves of High Importance for Efficient Operation and Maintenance of Water Distribution Systems. Water Resour. Manag.
**2008**, 22, 719–736. [Google Scholar] [CrossRef] - Jayaram, N.; Srinivasan, K. Performance-Based Optimal Design and Rehabilitation of Water Distribution Networks Using Life Cycle Costing: WATER DISTRIBIUTION NETWORKS. Water Resour. Res.
**2008**, 44. [Google Scholar] [CrossRef] - Khomsi, D.; Walters, G.A.; Thorley, A.R.D.; Ouazar, D. Reliability Tester for Water-Distribution Networks. J. Comput. Civ. Eng.
**1996**, 10, 10–19. [Google Scholar] [CrossRef] - Kim, S.; Jeong, S.; Woo, I.; Jang, Y.; Maciejewski, R.; Ebert, D.S. Data Flow Analysis and Visualization for Spatiotemporal Statistical Data without Trajectory Information. IEEE Trans. Visual. Comput. Graph.
**2018**, 24, 1287–1300. [Google Scholar] [CrossRef] [PubMed] - Fobil, J.N.; Levers, C.; Lakes, T.; Loag, W.; Kraemer, A.; May, J. Mapping Urban Malaria and Diarrhea Mortality in Accra, Ghana: Evidence of Vulnerabilities and Implications for Urban Health Policy. J. Urban Health
**2012**, 89, 977–991. [Google Scholar] [CrossRef] [PubMed] - Osei, F.B.; Stein, A. Spatial Variation and Hot-Spots of District Level Diarrhea Incidences in Ghana: 2010–2014. BMC Public Health
**2017**, 17, 617. [Google Scholar] [CrossRef] [PubMed] - Huang, H.; Yang, H.; Chen, Y.; Chen, T.; Bai, L.; Peng, Z.-R. Urban Green Space Optimization Based on a Climate Health Risk Appraisal–A Case Study of Beijing City, China. Urban For. Urban Green.
**2021**, 62, 127154. [Google Scholar] [CrossRef] - Khalid, S.; Shoaib, F.; Qian, T.; Rui, Y.; Bari, A.I.; Sajjad, M.; Shakeel, M.; Wang, J. Network Constrained Spatio-Temporal Hotspot Mapping of Crimes in Faisalabad. Appl. Spat. Anal.
**2018**, 11, 599–622. [Google Scholar] [CrossRef] - Wang, D.; Ding, W.; Lo, H.; Morabito, M.; Chen, P.; Salazar, J.; Stepinski, T. Understanding the Spatial Distribution of Crime Based on Its Related Variables Using Geospatial Discriminative Patterns. Comput. Environ. Urban Syst.
**2013**, 39, 93–106. [Google Scholar] [CrossRef] - Mao, Y.; Qin, G.; Ni, P.; Liu, Q. Analysis of Road Traffic Speed in Kunming Plateau Mountains: A Fusion PSO-LSTM Algorithm. Int. J. Urban Sci.
**2022**, 26, 87–107. [Google Scholar] [CrossRef] - Tang, J.; Wang, X.; Zong, F.; Hu, Z. Uncovering Spatio-Temporal Travel Patterns Using a Tensor-Based Model from Metro Smart Card Data in Shenzhen, China. Sustainability
**2020**, 12, 1475. [Google Scholar] [CrossRef] - Shao, H.; Sun, H.; Cui, W. Chinese Word Segmentation Based on Improved Double Hashtable. In Proceedings of the Fifth International Conference on Machine Vision (ICMV 2012): Computer Vision, Image Analysis and Processing, Wuhan, China, 13 March 2013; Wang, Y., Tan, L., Zhou, J., Eds.; SPIE: Bellingham, WA, USA, 2013; p. 87830U. [Google Scholar] [CrossRef]
- Xiong, Z. An Algorithm Rapidly Segmenting Chinese Sentences into Individual Words. MATEC Web Conf.
**2019**, 267, 04001. [Google Scholar] [CrossRef] - Liu, Z.; Zheng, T.; Xu, G.; Yang, Z.; Liu, H.; Cai, D. Training-Time-Friendly Network for Real-Time Object Detection. arXiv
**2019**, arXiv:1909.00700. [Google Scholar] [CrossRef] - Schoier, G.; Borruso, G. Spatial Data Mining for Highlighting Hotspots in Personal Navigation Routes. Int. J. Data Warehous. Min.
**2012**, 8, 45–61. [Google Scholar] [CrossRef] - Škuta, C.; Bartůněk, P.; Svozil, D. InCHlib–Interactive Cluster Heatmap for Web Applications. J. Cheminform.
**2014**, 6, 44. [Google Scholar] [CrossRef] [PubMed] - Huang, C.-W.; Shih, T.-Y. On the Complexity of Point-in-Polygon Algorithms. Comput. Geosci.
**1997**, 23, 109–118. [Google Scholar] [CrossRef] - García Zapata, J.-L.; Díaz Martín, J.C. A Geometric Algorithm for Winding Number Computation with Complexity Analysis. J. Complex.
**2012**, 28, 320–345. [Google Scholar] [CrossRef] - Hormann, K.; Agathos, A. The Point in Polygon Problem for Arbitrary Polygons. Comput. Geom.
**2001**, 20, 131–144. [Google Scholar] [CrossRef] - Ochilbek, R. A New Approach (Extra Vertex) and Generalization of Shoelace Algorithm Usage in Convex Polygon (Point-in-Polygon). In Proceedings of the 2018 14th International Conference on Electronics Computer and Computation (ICECCO), Kaskelen, Kazakhstan, 29 November–1 December 2018; pp. 206–212. [Google Scholar] [CrossRef]
- Fu, Q.; Liang, X.; Zhang, J.; Qi, D.; Zhang, X. A Geofence Algorithm for Autonomous Flight Unmanned Aircraft System. In Proceedings of the 2019 International Conference on Communications, Information System and Computer Engineering (CISCE), Haikou, China, 5–7 July 2019; pp. 65–69. [Google Scholar] [CrossRef]

**Figure 5.**Typical water shortage classification in GZ city. (

**a**) The topological structure of the water distribution system in GZ city was divided into 59 zones. (

**b**) Relationship between the position of coordinate points and boundary intersections. (

**c**) The proportionality of triangles was used to determine the location of a given coordinate point.

**Figure 9.**Fitted Sigmoid function lines in four zones. (

**a**) Sigmoid function line in the YC zone; (

**b**) Sigmoid function line in the DS zone; (

**c**) Sigmoid function line in the TX zone; (

**d**) Sigmoid function line in the HP zone.

Zone | σ (m) |
---|---|

YC | 0.3941 |

DS | 0.4155 |

TX | 0.4722 |

HP | 0.3879 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Cheng, W.; Luo, H.; Long, Z.; Xu, G.; Tian, L.
Statistical Modeling of Water Shortage in Water Distribution Systems in Guangzhou. *Water* **2023**, *15*, 3257.
https://doi.org/10.3390/w15183257

**AMA Style**

Cheng W, Luo H, Long Z, Xu G, Tian L.
Statistical Modeling of Water Shortage in Water Distribution Systems in Guangzhou. *Water*. 2023; 15(18):3257.
https://doi.org/10.3390/w15183257

**Chicago/Turabian Style**

Cheng, Weiping, Huidan Luo, Zhihong Long, Gang Xu, and Lin Tian.
2023. "Statistical Modeling of Water Shortage in Water Distribution Systems in Guangzhou" *Water* 15, no. 18: 3257.
https://doi.org/10.3390/w15183257