Can Proxy-Based Geospatial and Machine Learning Approaches Map Sewer Network Exposure to Groundwater Infiltration?

Zeydalinejad, Nejat; Javadi, Akbar A.; Jacob, Mark; Baldock, David; Webber, James L.

doi:10.3390/smartcities8050145

Open AccessArticle

Can Proxy-Based Geospatial and Machine Learning Approaches Map Sewer Network Exposure to Groundwater Infiltration?

by

Nejat Zeydalinejad

^1,*

,

Akbar A. Javadi

²,

Mark Jacob

³,

David Baldock

³ and

James L. Webber

²

¹

Centre for Resilience in Environment, Water and Waste (CREWW), University of Exeter, North Park Road, Exeter EX4 4TA, UK

²

Centre for Water Systems, Department of Engineering, Faculty of Environment, Science and Economy, University of Exeter, North Park Road, Exeter EX4 4QF, UK

³

South West Water, Peninsula House, Rydon Lane, Exeter EX2 7HR, UK

^*

Author to whom correspondence should be addressed.

Smart Cities 2025, 8(5), 145; https://doi.org/10.3390/smartcities8050145

Submission received: 22 July 2025 / Revised: 2 September 2025 / Accepted: 4 September 2025 / Published: 5 September 2025

Download

Browse Figures

Versions Notes

Abstract

Highlights

What are the main findings?

A geospatial–machine learning framework was developed to screen sewer network exposure to groundwater infiltration (GWI) at high spatial resolution.
The integration of fuzzy-AHP and K-means clustering yielded robust classification of GWI risk zones (high, intermediate, low), validated by storm overflow discharge data.
Sensitivity analysis identified five key influencing factors among sixteen: groundwater depth, river proximity, flood potential, rock type, and alluvium.

What is the implication of the main findings?

The proposed approach supports proactive sewer infrastructure management and planning, contributing to long-term sustainability and resilience under climate and urbanisation pressures.

Abstract

Sewer systems are essential for sustainable infrastructure management, influencing environmental, social, and economic aspects. However, sewer network capacity is under significant pressure, with many systems overwhelmed by challenges such as climate change, ageing infrastructure, and increasing inflow and infiltration, particularly through groundwater infiltration (GWI). Current research in this area has primarily focused on general sewer performance, with limited attention to high-resolution, spatially explicit assessments of sewer exposure to GWI, highlighting a critical knowledge gap. This study responds to this gap by developing a high-resolution GWI assessment. This is achieved by integrating fuzzy-analytical hierarchy process (AHP) with geographic information systems (GISs) and machine learning (ML) to generate GWI probability maps across the Dawlish region, southwest United Kingdom, complemented by sensitivity analysis to identify the key drivers of sewer network vulnerability. To this end, 16 hydrological–hydrogeological thematic layers were incorporated: elevation, slope, topographic wetness index, rock, alluvium, soil, land cover, made ground, fault proximity, fault length, mass movement, river proximity, flood potential, drainage order, groundwater depth (GWD), and precipitation. A GWI probability index, ranging from 0 to 1, was developed for each 1 m × 1 m area per season. The model domain was then classified into high-, intermediate-, and low-GWI-risk zones using K-means clustering. A consistency ratio of 0.02 validated the AHP approach for pairwise comparisons, while locations of storm overflow (SO) discharges and model comparisons verified the final outputs. SOs predominantly coincided with areas of high GWI probability and high-risk zones. Comparison of AHP-weighted GIS output clustered via K-means with direct K-means clustering of AHP-weighted layers yielded a Kappa value of 0.70, with an 81.44% classification match. Sensitivity analysis identified five key factors influencing GWI scores: GWD, river proximity, flood potential, rock, and alluvium. The findings underscore that proxy-based geospatial and machine learning approaches offer an effective and scalable method for mapping sewer network exposure to GWI. By enabling high-resolution risk assessment, the proposed framework contributes a novel proxy and machine-learning-based screening tool for the management of smart cities. This supports predictive maintenance, optimised infrastructure investment, and proactive management of GWI in sewer networks, thereby reducing costs, mitigating environmental impacts, and protecting public health. In this way, the method contributes not only to improved sewer system performance but also to advancing the sustainability and resilience goals of smart cities.

Keywords:

groundwater infiltration; sewer networks; geographic information system; Analytical hierarchy process; K-means; cluster analysis

1. Introduction

A well-functioning sewer system is crucial for effective and sustainable urban management, significantly impacting environmental, social, cultural, and economic aspects [1]. As a key component of municipal infrastructure, sewer systems collect and transport wastewater and stormwater, directly affecting urban environments, living conditions, and overall functionality [2]. However, ageing and rapidly deteriorating sewer networks are undermining their designed performance, leading to severe environmental, public health, and socio-economic consequences even in many developed countries [3]. Climate change and urbanisation are further exacerbating these challenges [4,5]. Sewer failures can arise from multiple factors, including overcapacity, poor construction, inadequate maintenance, ageing infrastructure, and environmental stressors such as droughts, earthquakes, and floods [6]. Groundwater infiltration (GWI) is a particularly critical issue affecting sewer network sustainability [7], imposing significant financial burdens [8] and lowering the network functionality required to achieve intended urban and environmental standards [9].

Unlike visible infrastructures such as buildings or bridges, sewer networks are buried underground, making direct visual assessment impractical [10]. Combined with constraints on time and funding, this necessitates predictive models for systematic monitoring, evaluation, maintenance cost estimation, and forecasting of pipe deterioration [10]. Technologies such as artificial intelligence (AI) and geographic information systems (GISs) play a crucial role in developing climate-resilient infrastructure and services [11].

Machine learning (ML), a subset of AI, enables systems to learn from data without predefined rules [12], while deep learning (DL), a subset of ML, comprises emerging algorithms with transformative potential [13]. However, their application in sewer network analysis is still at an early stage [12]. Recently, ML and DL approaches have attracted attention for sewer condition assessment, defect identification, classification, and evaluation [14,15,16], enhancing prediction accuracy and reducing uncertainty [17]. ML algorithms can be categorised into supervised, reinforcement, and unsupervised learning [18]. Supervised learning relies on labelled data, reinforcement learning involves agent–environment interactions to maximise rewards, and unsupervised learning uncovers hidden patterns in unlabelled data, making it particularly useful for clustering, feature extraction, and dimensionality reduction [19]. Clustering techniques effectively map subsurface heterogeneity [20,21], with K-means clustering being especially popular [21,22] due to its simplicity, computational efficiency, and ability to handle large multivariate data sets [23]. K-means clusters objects based on similarity measured by a distance metric [21,24]. Despite these advantages, its application to groundwater–sewer interaction, particularly for mapping sewer network exposure to GWI using spatially distributed proxy indicators, remains underexplored.

A key factor in successful K-means analysis is application of optimised weightings for input data, as inappropriate weightings may distort clustering outputs. Recent advancements have introduced various techniques for optimising parametric ranks and weights [25]. Analytical hierarchy process (AHP) is a simple highly effective method for weight optimisation [26]. It is a widely used multi-criteria decision-making (MCDM) method, particularly suited for decision-making in problems involving multiple alternatives or influencing factors [27].

The integration of a GIS-based approach with AHP [28], fuzzy (F) theory [29], or their combination (F-AHP), has been widely recognised as a reliable and robust method for decision-making across various fields of study [30,31,32,33].

While the integration of GISs with AHP has proven to be robust in supporting decision-making, it is equally important to evaluate the reliability and effectiveness of these models. Evaluating a model’s ability to support decision-making is a critical step in ensuring the credibility of its outputs [32]. In this context, sensitivity analysis plays a vital role in model development [34], as it helps assess model reliability and identify sources of uncertainty [32].

In addition to traditional decision-support approaches, ML techniques have emerged as powerful tools for modelling and analysis in complex infrastructure systems such as sewer networks. ML techniques have been utilised in sewer network evaluations, particularly for assessing structural conditions and predicting deterioration [35,36,37,38,39]. They also have been applied in hydraulic analyses. For instance, references [40,41] developed a hydraulic model to identify sewer in-line storage, using recurrent neural networks (RNNs) to predict flow, with long short-term memory (LSTM) methods showing the best performance. Reference [42] compared classification and supervised learning algorithms for assessing the hydraulic conditions of sewer collection systems in Jinju, Korea. A self-organising map (SOM) classified data sets into various warning levels without predefined criteria for individual parameters. The results indicated that the supervised learning algorithms performed similarly in predicting warning levels defined by the SOM, compared to the existing classification algorithms. Other hydraulic applications of ML in sewer networks include concrete pipe-joint infiltration [43], early warning systems for flow anomalies [44], and flooding [45].

Although a few studies have applied ML to assess GWI in sewer networks, the literature remains limited, particularly regarding generalisable and spatially transferable approaches. For instance, a comparative study in Espoo, Finland, assessed AI models for detecting inflow and infiltration (I&I), using an adaptive neurofuzzy inference system (ANFIS) and a multilayer perceptron neural (MLPN) network, with ANFIS outperforming the other [46]. Similarly, reference [47] combined a statistical model with ML to predict GWI in sewer networks in Hoboken, USA. The model, based on logistic regression, utilised ML for calibration, verification, and testing, demonstrating high efficacy with groundwater level (GWL) as the primary influencing factor. However, these studies highlight a key research gap: the need for spatially informed ML-based frameworks to prioritise areas within sewer networks at risk of GWI, a challenge this study addresses through a geospatially integrated case study.

GIS applications in sewer networks have been employed for various evaluations, including sewer condition and failure [48,49,50,51,52], site selection [53,54,55,56,57], sewer flow and flooding [58], and sewer exfiltration and groundwater contamination [59,60,61].

Despite the extensive application of GISs for managing sewer networks, research on GWI remains scarce, as does the use of ML-based approaches in this context. References [62,63] analysed the spatial susceptibility of sewer networks to I&I using GISs, while reference [64] investigated groundwater inundation driven by interactions between sewer infrastructure and climate change. Regarding studies quantifying GWI probability scores in sewer networks, reference [65] developed a GIS-based F-AHP modelling approach incorporating multiple thematic layers from hydrological and hydrogeological perspectives. Their findings highlighted groundwater depth (GWD) and river proximity as key influencing factors, though integrating all layers was recommended. Other studies have used a GIS as a supplementary tool, such as [66,67], who employed a GIS as an auxiliary model alongside MODFLOW to assess sea level rise and climate change impacts on coastal groundwater and sewer networks.

Studies employing both ML and GISs in sewer networks are even more limited. While a few have been conducted, none—to the best of the authors’ knowledge—have specifically focused on GWI. Reference [68] utilised a GIS and a genetic-algorithm-based optimisation technique to prioritise sewer sets for annual renewal in Regina, Canada. Additionally, ML and GISs have been applied in sewer network condition assessments in Ålesund city, Norway [10,51,69]. In these studies, GISs were employed for analysing, storing, and visualising results. A total of ten physical factors (age, diameter, depth, slope, length, pipe type, material, network type, pipe form, and connection type) and ten environmental factors (rainfall, geology, landslide area, building area, population, land cover/land use (LC/LU), groundwater, traffic volume, proximity to roads, and soil type) were considered. The findings identified sewer material and age as the primary influencing factors, while network type was found to be the least significant. Furthermore, sewer renewal planning in Vernon, Canada, was investigated using a dynamic Bayesian network (DBN) and a GIS, considering factors such as deterioration, climate change, and urbanisation. The DBN model captured dependencies between indicators, quantified uncertainty, and updated beliefs as new information became available. A GIS was used to collect and process model input data and to visualise the results of the analysis. Risk scores were assigned to sewer pipes to guide replacement priorities [70]. Finally, GIS-based AHP and random forest (RF) were employed by [71] to determine potential locations for wastewater treatment facilities in Tiruchirappalli, India, considering ground slope, land use, and proximity to water bodies.

As previously highlighted, ML and GISs have rarely been utilised in GWI evaluations within sewer networks. More broadly, research on GWI in sewer networks remains limited, with significant gaps in the literature [7]. To date, no study has appraised the effectiveness of ML and geospatial technology in identifying GWI probability scores in sewer networks. Additionally, even though K-means clustering has demonstrated effective performance in hydrological studies [72], its application to GWI remains unexplored. Although the integration of AHP with clustering techniques has been explored in other domains, its application in the context of sewer exposure to GWI remains limited. This study addresses this gap by integrating AHP-derived weights into the K-means clustering process to enhance the reliability and interpretability of spatial clustering results.

The general objective of this study is to provide a comprehensive framework for assessing GWI in sewer networks by quantifying probability scores and identifying key influencing factors. In this study, we aim to determine GWI probability scores in sewer networks using ML and geospatial technology, with values ranging from 0 to 1. Additionally, we cluster the model domain into high-, intermediate-, and low-GWI-risk zones. The significance of developing such an index for these evaluations is well recognised, as demonstrated by [15] in the context of exfiltration severity. We integrate various hydrological thematic layers—an aspect that remains underexplored despite the numerous factors influencing GWI in sewer networks. Furthermore, we conduct a sensitivity analysis of the probability maps, a technique that, despite its established importance [73], is still underutilised in decision-making models. This analysis identifies the key factors that have the greatest impact on the final GWI probability scores. By systematically quantifying the relative influence of input variables, this analysis strengthens the robustness, transparency, and interpretability of our model. It also constitutes a key methodological contribution, providing novel insights into the primary drivers of sewer network exposure to GWI and enhancing the model’s practical applicability.

Importantly, the proposed framework can be integrated into smart city initiatives, where digitalised infrastructure management relies on high-resolution data and predictive analytics. Embedding GWI assessments into smart city platforms enables proactive sewer network management, reduces operational costs, mitigates environmental risks, and contributes to the overall sustainability and resilience of urban systems.

The following sections begin with an overview of the study area. This is followed by a description of the data, including various thematic layers, and an outline of the methodology for both geospatial and ML approaches. The outputs of the models are then presented. Following this, a Discussion section compares the findings with other studies, addresses the limitations and uncertainties associated with GWI evaluations in sewer networks, and highlights the challenges in the field. Finally, this study concludes with a summary of key findings and recommendations for future research directions.

2. Materials and Methods

2.1. Methodology

The experimental design of this study was structured to systematically quantify GWI probability scores in sewer networks, ensuring reproducibility and robust validation. The methodology, illustrated in Figure 1, proceeds through a series of interrelated stages. It begins with data collection, which provides the foundation for generating various raster layers. These layers are then classified, followed by reclassification and fuzzification to standardise and prepare them for further analysis. Weight determination is carried out using the AHP approach, after which the weighted layers are integrated within the GIS environment. To enhance the robustness of the results, the methodology also incorporates layer integration using K-means clustering. Sensitivity analysis is subsequently applied to identify the most influential factors. The outputs of the different modelling approaches are then compared using the Kappa statistic to evaluate consistency and performance. Finally, the GIS and K-means clustering are jointly applied to categorise the probability scores into high-, intermediate-, and low-risk zones. The following subsections provide a detailed explanation of each stage.

2.1.1. Data and Thematic Layers

Sixteen thematic maps were developed to represent a wide range of geological, geomorphological, hydrological, hydrogeological, climatic, and topographical characteristics (Table 1). These included elevation (Lidar Digital Terrain Model (DTM) with 1 m resolution in 2022) [74], slope, drainage order, topographic wetness index (TWI), river proximity [75,76], flood potential [77], LC/LU [78], soil [79], rock [80], alluvium [80], made ground [80], mass movement [80], fault proximity [80], fault length [80], GWD [81], and precipitation [82].

Elevation data were utilised to derive slope, drainage order, and TWI. The data sets, acquired from multiple sources in diverse formats, were converted as necessary to ensure compatibility. Vector data were processed into raster format through methods like inverse distance weighting (IDW) and rasterisation. To maintain uniformity with the DTM, all raster layers were standardised to a cell size calculation unit of 1 m × 1 m. Additionally, seasonal variations were analysed by identifying and applying the month with the highest precipitation in each season: March (winter), April (spring), September (summer), and December (autumn).

2.1.2. Classification

Each layer was categorised according to its influence on infiltration in sewer infrastructure, drawing upon a comprehensive review of the scientific literature, including [65]. The classification scheme is summarised in Table 1, with rationale from the literature outlined below.

Elevation and slope are crucial factors influencing groundwater flow dynamics, with lower elevations linked to increased GWI [65,83]. Steeper slopes typically indicate reduced groundwater potential [84], while areas with flat to gentle slopes in lowlands are generally associated with higher groundwater potential [85].

River proximity significantly affects GWL, particularly enhancing its impact in flat landscapes [86]. The river level largely governs the water table, with fluctuations influenced by tidal changes [87]. As river levels rise from their lowest to highest points, the water table can increase by up to 1 m, particularly near the inner boundary, occasionally reaching ground level [87]. Additionally, an increase in drainage density, especially from lower initial values, significantly amplifies flood peaks [88]. Consequently, areas with greater drainage density or order are more susceptible to infiltration and failure risks. Elevated flood potential also contributes to decreasing GWD [89]. TWI represents the likelihood of water accumulation and its downslope movement due to gravity, with higher TWI values indicating increased groundwater presence [90].

Geology plays a crucial role in both water infiltration and percolation [91] and determines the ability of geological formations to store groundwater [92]. Highly resistant rock formations restrict infiltration, leading to lower GWLs [85], whereas permeable subsoil materials promote infiltration and recharge, facilitating water accumulation [90]. Different rock types and alluvial deposits exhibit varying permeability and groundwater flow characteristics [93]. Permeability is a key factor influencing GWI, with higher permeability increasing the likelihood of infiltration into sewer networks [94]. Conversely, low-permeability soils limit infiltration [95,96].

LC/LU also plays a significant role in groundwater recharge by affecting evapotranspiration, surface runoff, and infiltration [97]. It is a crucial parameter for identifying flood-prone areas [98]. The type of LC/LU influences infiltration rates, where forests and vegetated areas enhance water absorption [99], while urban environments with impermeable surfaces contribute to increased stormwater runoff [100]. Additionally, reference [65] noted that made ground areas are associated with an increased risk of GWI.

GWL is a key factor in influencing the likelihood of slope failure or mass movement [101]. Faults and lineaments are commonly linked to permeable zones with higher groundwater potential [102,103], and joints and fractures further facilitate GWI [104,105].

Subsurface infiltration is largely influenced by GWD [106]. Regions with shallower GWD are more vulnerable to GWI [59]. Infiltration is regarded as indirect when the GWD lies deeper than sewer pipes, with flow driven by local soil saturation. However, when the GWD is shallower than the sewer pipes, the infiltration is classified as GWI [106].

Precipitation events can lead to rainfall-derived inflow and infiltration (RDII) in sewer networks [62]. Infiltration into the sewer system raises the likelihood of combined sewer overflows (CSOs), causing untreated sewage to be discharged even after minimal rainfall or during dry weather conditions (DWCs) [87].

After identifying the influence of each factor on sewer exposure to GWI, the corresponding classes or value ranges were defined accordingly. This classification scheme translates scientific understanding into a standardised and spatially applicable framework, forming the basis for subsequent modelling and multi-criteria analysis.

2.1.3. Reclassification

Since the criterion maps have different scales, they must be standardised to comparable units [105]. After classifying each layer, the values were normalised to a range of 0 to 1 using the fuzzify raster method (linear membership) to enable the integration of all layers.

2.1.4. Weights of Layers

The weight of each factor (Table 1) was determined using the AHP method, in accordance with Saaty’s principles [107,108]. This approach enables pairwise comparisons of the thematic layers [109,110]. For this study, a 16 × 16 matrix was constructed to facilitate these comparisons.

The relative significance of each criterion was assessed in comparison to the others, with values assigned according to Saaty’s scale, where higher values indicate a higher level of importance [111]. A value of 1 was assigned to the diagonal elements of the matrix, signifying that the compared criteria are considered equally important [112].

The subsequent step involved computing eigenvalues and eigenvectors to assess the relative weight of each criterion [113]. They were used to normalise the weight of each layer, ensuring that the total of all normalised weights summed to 1 [111]. This approach enhances the evaluation’s precision and helps reduce potential bias in the weight assignment process [114].

The final step consisted of assessing the hierarchical ranking and conducting CR (consistency ratio) checks through the evaluation matrix [115]. AHP provides a mathematical framework for evaluating the consistency of judgments, as outlined below:

C I = \frac{L_{m a x} - n}{n - 1}

(1)

C R = \frac{C I}{R I}

(2)

where

C I

denotes the consistency index proposed by [116], which measures the degree of inconsistency in the pairwise comparisons.

L_{m a x}

is the largest eigenvalue obtained from the pairwise comparison matrix, while n represents the total number of factors assessed.

R I

refers to the random index, and

C R

is the consistency ratio, which is used to appraise the consistency of the pairwise comparison matrix.

Maintaining a CR below 0.1 is vital. If it goes beyond this limit, it indicates inconsistency [115], suggesting that the matrix should be updated to ensure better consistency.

2.1.5. Combination of Layers

Fuzzified layers were integrated according to the weights obtained from the AHP (Table 1) using the following formula [117]:

G W I p r o b a b i l i t y s c o r e = \sum_{i = 1}^{n} w_{i} \cdot x_{i}

(3)

where

w_{i}

is the weight assigned to factor

i

,

x_{i}

is the normalised value of that factor at a given location, and

n

is the total number of factors (16 in this study). All factors and their corresponding weights are presented in Table 1.

2.1.6. K-Means Clustering

A digital twin is a dynamic virtual representation of a physical system that integrates real-time data, computational models, and analytical tools to simulate, monitor, and optimise system performance. In the water sector, digital twins can combine spatial data, sensor measurements, and predictive modelling to support early-warning systems and adaptive management of GWI. Our methodology represents a step towards this integration by combining GIS-based analyses, AHP-derived weights, and clustering outputs to identify distinct GWI risk levels.

In this study, we used K-means cluster analysis to represent the distinct risk levels of GWI in sewer networks, with 16 thematic layers serving as features (inputs) for the clustering analysis.

The K-means algorithm is a widely used clustering technique that partitions data into a predefined number of disjoint clusters (K) based on their randomly initialised centres. It minimises the squared error function between each cluster’s centre and its associated data points [23,118,119]. K-means is favoured in scientific research for its simplicity, ease of implementation, and computational efficiency [120].

K-means cluster analysis is a statistical technique designed to group objects in a multivariate data set based on their inherent similarities, which are measured using a distance metric [21,121,122,123]. Euclidean distance is commonly employed to group data points based on their proximity in Euclidean space [22]. The process is iterative, beginning with the random assignment of clusters to data points, followed by a reallocation of data to the nearest cluster centre [124]. The K-means algorithm aims to minimise the within-cluster sum of squares by ensuring that data points within the same cluster are more similar to each other than to those in different clusters [125]. It assigns inputs to clusters in a way that minimises the distance between each data point and its respective cluster centroid [126]. The algorithm seeks to optimise an objective function, defined as follows [72]:

J = \sum_{i = 1}^{N} \sum_{j = 1}^{K} r_{n k} {(‖x_{i} - v_{j}‖)}^{2}

(4)

Here,

N

represent the total number of data points,

K

denotes the number of clusters,

x

represents the set of all data points, and

v

denotes the set of cluster centres. The term

‖x_{i} - v_{j}‖

refers to the Euclidean distance between a data point

(x_{i})

and a cluster centre

(v_{j})

. Additionally, the indicator variable

r_{n k}

∈ {0, 1} specifies which of the

K

clusters a given data point

x_{i}

is assigned to.

It is important to emphasise that before implementing K-means clustering, the data must be standardised due to the multivariate nature of the data, where variables or parameters have different units and scales. Standardisation ensures that all features are on the same scale, preventing any single attribute with a larger magnitude from disproportionately influencing the clustering process [22].

2.1.7. Verification and Comparison of Models

The proposed methodology was validated using multiple approaches to ensure robustness and reliability. First, the CR value was employed to validate the AHP method, serving as a measure of consistency in pairwise comparisons. A CR below 0.10 signifies a high level of consistency, whereas values exceeding this threshold indicate potential inconsistencies, necessitating a reassessment of the judgments [127].

Second, the environmental relevance of storm overflows was considered. Storm overflows represent critical points where sewer networks exceed their capacity, presenting considerable risks to infrastructure [128,129]. These hazards compromise the efficiency of sewer systems, leading to a deterioration in their overall performance [130]. Since locations of storm overflow discharges are symptomatic of networks exceeding capacity, their presence, as identified from DEFRA data, was considered as a criterion indicative of potential GWI in the final prioritisation maps.

Finally, the agreement between outputs from the employed approaches was quantified using the Kappa statistic [131]. It has been utilised in distributed environmental modelling for map comparison [132], including applications in spatial model validation [133,134,135] and the analysis of LC/LU change dynamics [136,137,138].

Cohen’s Kappa quantifies the level of agreement between two raster data sets by comparing their assignment of items to distinct, non-overlapping categories [139]. To enable meaningful comparison, numerical maps must first be classified into discrete categories [140]. Building on Cohen’s [131] original concept for assessing nominal scales, the Kappa statistic provides a comprehensive measure of similarity between two categorised maps. Its calculation involves constructing a confusion matrix that captures the class frequencies of both maps, as outlined by [131]:

K = \frac{P_{o} - P_{e}}{1 - P_{e}}

(5)

The proportion of agreement,

P_{o}

, represents the frequency with which different rasters classify objects into the same category, normalised by the total sample size [141]. However, reference [131] argue that this measure alone is insufficient, as it does not account for agreement occurring by chance, denoted as

P_{e}

. The key advantage of the Kappa statistic lies in its ability to provide a more reliable assessment than a simple percentage agreement by factoring in random agreement [139]. Essentially, it quantifies the true agreement between rasters after excluding any agreement that could occur by chance [141].

Based on the computed Kappa value, agreement levels are determined using a confusion matrix (Table 2), with values ranging from 1, indicating perfect agreement, to below 0, where the observed agreement is poorer than what would be expected by chance [140].

2.1.8. Sensitivity Analysis and Key Influencing Factors

Sensitivity analysis is primarily conducted to evaluate how input factors influence the model’s output [142] and to identify the most impactful layers [143]. In this study, sensitivity analysis was carried out by formulating alternative scenarios for estimating GWI prioritisation scores in sewer networks, including the following:

-: A combination of all layers assigned equal weights.
-: A combination of only high-weighted layers.

The outcomes were then compared to the final layer combination obtained through the AHP analysis, facilitating the identification of key contributing factors.

2.2. Location Description

The probability scores of GWI in sewer networks are assessed using a case study in the Dawlish region, located in the southwest of the United Kingdom (UK), covering approximately 138 km². This area lies in the southern section of the groundwater operational catchment of the Permian Aquifers in Central Devon (Figure 2). This case study has been selected as a catchment representative of the typical combined sewer infrastructure of a mixed rural and urban setting. The distribution of sewer networks and sewersheds is primarily concentrated in the southeastern, southern, and northern parts of the study area (Figure 2).

Geological data from the BGS indicate that Quaternary deposits primarily consist of clay, silt, sand, and gravel, predominantly but not exclusively along riverbanks (Figure 2). The bedrock formations include gravel, sandstone, breccia, limestone, chert, slate, mudstone, basalt, and tuff, with breccia and sandstone being the most extensively outcropping rocks (Figure 2).

The dominant soil type across the study area ranges from loam to sandy loam, with various combinations of sand, loam, silt, clay, and peat present (Figure 3). LC/LU is predominantly composed of grassland, woodland, arable and horticultural land, suburban and urban areas, and heather. Additionally, a small portion of the area is characterised by inland rock, littoral and supralittoral rock and sediment, saltmarsh, and freshwater habitats (Figure 3).

Faults in the study area are predominantly concentrated in the northern and eastern parts of the model domain (Figure 2). Made ground is primarily located in the northern section (Figure 3), and a few instances of mass movement have been recorded across the area (Figure 3).

The region is intersected by rivers with a total length of 136.9 km (Figure 2). Additionally, flood potential, classified into high and low categories, is mainly concentrated in the northeastern part of the Dawlish region (Figure 3).

Climate data from the Met Office climate data portal, based on the HadUK-Grid gridded data set at a 1 km × 1 km resolution for 2023—the most recent year available—indicate that March (winter) recorded the highest precipitation, ranging from 118 mm to 238 mm, while April (spring) had the lowest, ranging from 65.5 mm to 113 mm [82]. Overall, precipitation tends to decrease from the southern and southwestern parts of the study area towards the northern and northeastern regions (Figure 4).

A total of seven observation wells (Figure 2) are present in the region, providing hydraulic head time series data sourced from DEFRA (https://environment.data.gov.uk/hydrology/explore (accessed on 20 July 2025). In 2023, GWD recorded in these wells ranged from 0.4 to 35.2 m, with the deepest water table occurring in summer and the shallowest in autumn and winter (Figure 4). Across all seasons, the water table deepened from the northeastern part of the study area towards the southwestern portion (Figure 4), aligning with the topographic elevation (Figure 3).

3. Results

3.1. Thematic Layers

The fuzzified maps of thematic layers were used to calculate spatial GWI probability scores, with GWD and precipitation accounting for spatio-temporal effects (Figure 5 and Figure 6). Generally, fuzzified values indicate higher scores in regions with greater permeability, proximity to faults and rivers, lower elevations, and gentler slopes, as well as in areas characterised by made ground and mass movements. Flood potential is more pronounced in the northeastern part of the region. Elevation decreases towards the east, suggesting a higher likelihood of GWI in infrastructure in that direction. Slope, drainage order, and TWI introduce greater spatial variability, displaying similar patterns that capture localised differences in topography and gradient, along with the influence of fill and sink analysis in their development (Figure 5 and Figure 6).

For spatio-temporal assessments, seasonal fuzzified layers were generated for precipitation and GWD. Precipitation-related vulnerability is highest in winter (ranging from 0.5 to 1) and lowest in spring (ranging from 0.3 to 0.5). The northeastern region, which generally receives the least precipitation, is expected to have the lowest GWI probability, while the southwestern area, experiencing the highest precipitation, is likely to exhibit the greatest vulnerability (Figure 6).

Regarding GWD, which ranges from 0 to 1, it tends to be deeper in the southwestern part of the area, where GWI probability is lower, and shallower towards the northeast, where GWI probability is higher. Additionally, GWD shows minimal seasonal variation across the region, resulting in negligible changes in the GWI probability maps. However, in summer, GWD reaches slightly greater depths, corresponding to lower GWI probability scores (Figure 6).

3.2. Geospatial Technology

With a CR value of 0.02, the pairwise comparisons used to derive weights for different thematic layers were considered consistent (Table 1). The integration of all layers using the F-AHP GIS approach yielded GWI probability scores for each 1-square-metre unit per season, ranging from 0 to 1. Spatially, GWI probability scores generally increase from the southwest to the northeast of the study area. Temporally, they remain consistent across seasons (Figure 7). The minimum GWI score is either 0.07 or 0.08, the maximum reaches 0.78, the mean is 0.28 or 0.29, and the standard deviation remains 0.09 across all seasons (Table 3).

Locations of storm overflow discharges closely align with areas of higher GWI probabilities across all seasons, predominantly occurring in low-elevation regions with gentle slopes and near rivers with high flood potential (Figure 7). GWI scores at locations of storm overflow discharges have a minimum of 0.17, a maximum of 0.64, and a mean of 0.40 in winter, a season with higher precipitation (Table 4).

To perform a sensitivity analysis, in addition to generating an AHP-weighted GIS output, GWI probability scores were calculated using three alternative approaches: assigning equal weights to all layers, considering only the top-five highest-weighted layers, and using only the top-two highest-weighted layers. GWI probability scores range from 0.1 to 0.72 when all layers are equally weighted, from 0 to 0.93 when using the five highest-weighted layers (GWD, river proximity, flood potential, rock, and alluvium), and from 0 to 1 when considering only the top-two highest-weighted layers (GWD and river proximity). The difference between the AHP-weighted GIS output using all layers and the output with equal-weighted layers ranges from −0.11 to 0.11. When comparing the AHP-weighted GIS output with the five highest-weighted layers, the difference varies from −0.34 to 0.30. The widest variation is observed when using only the top-two highest-weighted layers, with differences ranging from −0.64 to 0.47 (Figure 8; Table 3).

GWI probability scores derived from the AHP-weighted GIS approach are higher than those from the equal-weight method in the northeastern parts of the study area and lower in the southwestern parts. When using the top-five or top-two highest-weighted layers, GWI probability scores tend to be higher in the northeastern region and lower in the southwestern region compared to the AHP-weighted GIS approach. This is because the calculations are based solely on a limited number of layers, leading to variations driven by the spatial distribution of those specific factors (Figure 8). Among the highest-weighted layers, GWD and flood potential increase from southwest to northeast, contributing to these observed patterns (Figure 5 and Figure 6).

The sensitivity analysis shows that as fewer thematic layers are considered, GWI probability scores display increased variation, resulting in a broader range. The difference between the AHP-weighted GIS output using all layers and the outputs with fewer layers—when assigned equal (non-AHP) weights—increases as the number of included layers decreases (Figure 8). This highlights the importance of incorporating all layers, even though certain factors, such as proximity to rivers and GWD, may be more influential. These findings align with the conclusions of [65].

3.3. Machine Learning to Classify Risk Regions

After fuzzifying the layers and assigning weights to them, these were used as inputs for the K-means clustering approach. Three clusters were taken into consideration, corresponding to high-, intermediate-, and low-GWI-risk levels. Overall, when incorporating the AHP weights for all layers, K-means clustering reveals a high-risk zone primarily in the northeastern section of the area, a low-risk zone in the southwestern part, and an intermediate-risk zone located between them. However, high-risk areas were also identified within both the low- and intermediate-risk clusters, particularly in proximity to rivers (Figure 9a). Additionally, locations of storm overflow discharges generally align with the high-risk GWI cluster (Figure 9a).

For the sensitivity analysis, K-means clustering was performed using both all layers with equal weights and the high-weighted layers (i.e., GWD, river proximity, flood potential, rock, and alluvium) as inputs, with the model domain classified into three risk levels (Figure 9b,c). The K-means clustering approach, when incorporating AHP weights for all layers (Figure 9a), results in a smoother and more spatially consistent classification of risk levels across the model domain, exhibiting greater consistency with the overall trend of GWI probability score variations derived from the GIS-based approach (Figure 7a). This is in contrast to the clustering results using equal weights for all layers (Figure 9b) or considering only key factors (Figure 9c) influencing GWI in sewer networks.

3.4. Evaluating Agreement Between Machine Learning and Geospatial Approaches Using Cohen’s Kappa

The Kappa statistic was used to compare the GIS and ML approaches. For this comparison, the final F-AHP-weighted outputs incorporating all layers were considered for both models, i.e., Figure 9a for K-means clustering and Figure 7a for the GIS. The F-AHP-weighted GIS outputs were classified into three risk levels using the K-means algorithm (Figure 10a). Additionally, the probability scores were divided using equal-interval classification (Figure 10b) and quantile classification (Figure 10c).

Kappa values of 0.35, 0.50, and 0.70 were computed, considering the AHP weights for all layers, when comparing the ML approach with (1) the GIS approach using equal-interval classification, (2) the GIS approach using quantile classification, and (3) the GIS approach using K-means clustering classification, respectively. Additionally, the percentage of matching classifications was 63.82%, 66.78%, and 81.44% for the same comparisons. These results indicate substantial agreement, as outlined in Table 2, when comparing ML with the geospatial technology when K-means clustering is used for classification (Table 5).

It is important to note that in the K-means clustering approach, the algorithm processes the AHP-weighted layers individually, identifying similar characteristics between them and grouping them into risk levels. In contrast, when K-means clustering is applied to the final GIS output, it clusters the data solely based on the single-layer final output of the GIS. This distinction is significant because clustering individual weighted layers may allow for a more nuanced classification that captures the complexity and variability of multiple factors influencing GWI risk, whereas clustering the single-layer output may overlook these underlying interactions.

4. Discussion

4.1. Integrating F-AHP-Based Geospatial Approach with ML to Efficiently Identify High-Risk Areas

Given the constraints of budget, time, and available assessment tools, conducting comprehensive monitoring and inspections across all sewer pipelines is practically unfeasible [144]. Moreover, applying traditional methodologies across an entire sewer network demands significant financial resources and time [49]. Modelling, forecasting, and optimising intricate and nonlinear processes in waste management remain challenging when relying on conventional techniques [145]. The underground nature of sewer infrastructure, coupled with the high costs associated with closed-circuit television (CCTV) inspections and the severe consequences of pipeline failures, underscores the necessity for advanced techniques to evaluate failure risks and optimise inspection priorities [49]. Consequently, greater focus should be placed on developing innovative models capable of spatio-temporal prioritisation of infrastructure, particularly in response to stressors such as GWI.

The current research introduces a robust framework for performing a reliable spatio-temporal analysis of GWI in sewer networks by integrating F-AHP-based geospatial analysis with ML techniques, effectively tackling the challenges associated with data limitations. The AHP framework was employed to assign weights to different criteria, followed by the classification and reclassification of thematic layers, which were then fuzzified to generate seasonal GWI probability scores. The GIS was utilised to analyse and integrate the contributing factors, while K-means clustering was applied to categorise the final scores into distinct risk levels. Leveraging the outputs, inspection schedules can be developed to prioritise the model domain according to the likelihood of GWI risk, ensuring that the most critical sections are inspected promptly and more frequently.

AHP is particularly useful for assigning appropriate weights to various factors that influence site suitability for urban development [146]. Additionally, it allows for the structured integration of field observations and prior knowledge, ensuring that the relative importance of factors is systematically considered within the model [146].

According to [32], integrating F-AHP with the GIS serves as a powerful and efficient approach for conducting environmental multi-criteria decision analysis (MCDA). This method has demonstrated reliability and cost-effectiveness, making it suitable for various applications, particularly in ungauged regions where historical hydrological data are not necessary for implementation. Additionally, F-AHP has been recognised for its ability to handle uncertainties associated with linguistic imprecision and the inherent subjectivity of human judgement [147].

The integration of AHP with the GIS offers numerous advantages, including efficient data management, incorporation of expert knowledge, cost and time savings in surveys, identification of key factors influencing decisions, and enhanced visualisation that strengthens decision-making processes [148]. The combination of these methods represents a significant advancement in spatial decision-making, enabling decision-makers to systematically assess and prioritise complex alternatives, ultimately leading to well-informed choices across diverse applications [148].

In recent years, various statistical and AI-driven models have been introduced to assess the condition of sewer pipelines [144]. With the rapid evolution of AI technologies and the constraints of traditional computational methods, AI-based approaches have become widely adopted across numerous disciplines, including medicine, linguistics, and engineering [149]. AI has demonstrated its capability to tackle complex and poorly defined problems, adapt through experience, and handle uncertainties as well as incomplete data sets [145].

By leveraging AI algorithms and big data analytics, water utilities can enhance decision-making, improve service efficiency, and minimise costs, as highlighted by [150]. Unsupervised learning algorithms are essential for deriving valuable insights from unlabelled data, allowing researchers and professionals to discover patterns, formulate hypotheses, and make informed decisions based on data [151].

4.2. Model Validation and Sensitivity Analysis for Robust Variable Selection in F-AHP GIS and ML Approaches

Many previous studies have focused solely on identifying significant variables within models without incorporating any validation methods [17]. In this research, we evaluated AI and a GIS by comparing their outputs using Kappa statistics and assessing their agreement percentage as a validation criterion. Additionally, after developing the model, a sensitivity analysis was carried out.

The sensitivity analysis applied in the F-AHP-GIS framework reinforced the reliability of the findings, demonstrating the effectiveness of the approach as a valuable decision-making tool, as also highlighted by [32]. Our results further emphasised the importance of incorporating all thematic layers with AHP-assigned weights while also identifying the most influential factors.

4.3. K-Means Clustering and Alternative Methods as Pathways for Future Groundwater and Sewer Network Research

Various unsupervised clustering techniques exist, including SOM. According to [152], some studies have shown that SOM outperforms K-means [153,154,155], while other research suggests that K-means delivers comparable results [156] or surpasses SOM in performance [157,158,159]. In their comparison, reference [160] found that K-means clustering provided superior cluster separation when compared to hierarchical clustering and Gaussian mixture models.

We chose the K-means clustering method due to its widespread use as an unsupervised clustering technique. This choice was made largely owing to the algorithm’s straightforward implementation and lower computational demands compared to more complex approaches [161,162,163,164,165,166,167,168,169]. These characteristics are particularly advantageous for developing an efficient screening tool, where ease of application is essential to handle large data sets and facilitate timely decision-making. Additionally, K-means clustering is particularly effective in capturing the distribution patterns of GWL dynamics, efficiently identifying key patterns in the data [160]. The use of K-means clustering has been shown to significantly decrease computational time when assessing aquifer response, while also factoring in both surface and subsurface hydraulic properties [170]. However, few studies have explored the use of unsupervised clustering techniques to analyse time-series hydraulic–hydrologic data, such as those from stormwater urban drainage systems [171].

We successfully applied K-means clustering to group GWI probability scores in sewer networks, a novel approach that has not been explored before. Previous studies involving clustering techniques have primarily focused on identifying issues such as pressure, demand, pipe bursts, infrastructure damage, and illicit intrusions within water distribution systems [172,173,174]. K-means clustering has also been employed in other areas, such as grouping water depths for flood detection [171], evaluating GWLs [175], optimising the number and locations of monitoring wells [176], planning groundwater resources [177], and clustering climatic variables [178]. Furthermore, it has been utilised to enhance the management and understanding of water resources, water distribution systems, and water consumption [179,180]. Recently, multivariate K-means clustering has been applied to well-logging data for rock typing and mapping lithological variations in complex environments [181,182,183]. In an earlier stage, K-means clustering was employed to divide an aquifer into five distinct regions for regional GWL simulation, considering factors such as precipitation, water recharge, water discharge, transmissivity, earth level, and water table [184].

4.4. Limitations of the F-AHP-Based Geospatial and ML Approach

4.4.1. How to Move Research from GWD to GWI Probability

Assessing the condition of sewer pipes involves certain uncertainties that may not be fully addressed in GWI evaluations in sewer networks studies. For example, assumptions regarding the construction and rehabilitation dates of sewer networks can introduce errors into the model developed by [63]. Additionally, models predicting pipe conditions are often based on historical data, meaning their results are only applicable to the specific sewer network under examination where the historical data were collected, and cannot be generalised to other networks [185].

Assessing the condition of sewer pipelines is a crucial aspect of water management, and an accurate condition model can greatly aid in decision-making and the development of maintenance strategies [51]. This is essential because the deterioration of sewer pipes is indeed a multifaceted process influenced by numerous factors, not just one [186,187,188]. As a result, incorporating more data from pipe inspections and additional factors is necessary to enhance predictive accuracy [51].

Various factors influencing sewer pipe conditions have been identified in previous studies. In the USA, Dallas Water Utilities assessed sewer pipe conditions based on diameter, age, material, length, and soil type as an environmental factor [189]. Reference [185] highlighted several key variables, listed in order of frequency, such as age, diameter, length, material, slope, location, type, depth of cover, soil type, number of trees, groundwater, maintenance history, shape, water breaks, changes in orientation and slope, elevation, bedding condition, serviceability, exposure, capacity, and category. Some variables show consistent patterns in their effect on pipe deterioration, with the material type being a prime example, as different materials exhibit distinct deterioration patterns [185]. However, reference [185] also pointed out that contradictions about the impact of certain factors on pipe deterioration are a common issue.

Waste management systems typically involve a wide range of technical, climatic, environmental, demographic, socio-economic, and legislative considerations [145]. The inclusion of various factors from multiple perspectives is critical for accurate analysis. I&I is a multifaceted issue that can be influenced by numerous factors, including GWD, pipe material quality, construction standards, proximity to other underground structures, soil properties, sewer type, and the overall structural condition [63]. This may serve as a foundational approach for shifting from GWD-based risk to GWI probability modelling via thematic layers. However, many studies focus on only a subset of these factors in their models.

4.4.2. Expanding Thematic Layers to Better Capture GWI

While much research has focused on sewer pipe conditions, fewer studies have been conducted on groundwater infiltration (GWI) in sewer networks [7]. Reference [146] combined geological, environmental, geomorphological, and geophysical data within a GIS framework using the AHP method, successfully creating a hazard/suitability model for civil engineering applications. The various elements that affect the presence and movement of groundwater may include geomorphology, lithology, weathering, geological formations, porosity, slope, drainage patterns, topography, LC/LU, and climate, as emphasised by [190]. Reference [191] identified factors such as LC/LU, soil properties, climate, hydrology, and drainage system type (stormwater, sanitary, or combined) in determining optimal locations for green infrastructure (GI) systems, noting that the distance between GI systems and the water table, especially in areas with a shallow table, is crucial. Similarly, reference [192] suggested incorporating environmental and operational factors, such as soil type, bedding material, flow rate, and soil corrosivity, alongside advanced data mining techniques to develop more comprehensive and accurate condition prediction models.

In this study, we used 16 thematic layers, including geological, geomorphological, hydrological, hydrogeological, climatic, and topographical features, to calculate GWI probability scores across the model area. The proposed approach utilises readily available data. These thematic layers were recommended by [65] for assessing GWI probability in sewer networks within the Lower River Otter Water Body, UK. Consistent with [62], our model does not perform a network analysis of sewer system vulnerability to GWI. Instead, it classifies the model area according to seasonal GWI probability scores.

4.4.3. Limitations and Consistency Considerations in AHP-GIS Approach

The integration of AHP with the GIS may present challenges, including the complexity of analysis, high demands for data quality, difficulties in modelling intricate scenarios, and limitations in defining appropriate criteria [148]. During the pairwise comparison of criteria in the AHP method, some degree of inconsistency may arise [193]. Although AHP offers numerous advantages, it has faced criticism for relying on a limited 1–9 scale, which some argue is inadequate for fully capturing ambiguity and uncertainty [194]. One of the main challenges in merging different types of knowledge is determining the relative importance or weight of each criterion in the MCDA process [195]. Multiple criteria must be evaluated at the same time, but their impact on GWI in sewer networks varies significantly. Indeed, in AHP, uncertainties may stem from various factors, such as the choice of criteria, weighting of those criteria, data collection errors, lack of sufficient information, as well as the ambiguity of linguistic terms or the subjective nature of the experts’ knowledge and experience [32].

In this study, we addressed these risks by applying consistency ratio (CR) scoring to ensure logical coherence in the pairwise comparisons. The assessment of logical consistency, typically performed using the CR with an acceptable threshold of 0.1, confirmed that the assignment of criterion weights was robust, as demonstrated by the CR values calculated via the AHP–matter element approach. Furthermore, we analysed the relative importance of thematic layers to enhance the robustness and transparency of the weighting process. To validate and cross-check the reliability of our findings, we also employed the GIS and K-means clustering methods to compare model outputs and incorporated verification criteria, such as the locations of storm overflow discharges.

4.4.4. Critical Challenges in Using ML and K-Means Clustering for GWI Risk Assessment

General Considerations

ML techniques present valuable solutions for water management, yet they encounter several challenges, including issues with data quality, availability, interpretability, explainability, and generalisation, and the integration of domain-specific knowledge [151]. According to [12], five key research challenges—data privacy, advancements in algorithms, interpretability and trust, multi-agent systems, and the use of digital twins—are essential for promoting the adoption and application of ML in urban water management. Unfortunately, many cities and organisations lack comprehensive, integrated databases containing all necessary data, and the existing databases often contain uncertainties and missing data [196].

The limitations may be associated with the quality of the input data and the selected criteria [197], together with the way the criteria are processed [198]. One limitation could be the potential bias and lack of comprehensive condition data [199]. Inaccurate or incomplete data can result in unreliable predictions, ultimately undermining decision-making processes in water management [151]. The effectiveness of DL models is greatly influenced by the choice of training data [200]. As a result, it is crucial to select key indicators for training while eliminating input features that have a negligible effect on the model’s predictive accuracy, without severely affecting its overall performance [201]. Despite this, there has been limited research dedicated to assessing and pinpointing the critical indicators within training data sets that impact model performance [2].

Sewer-Related Considerations

Data imperfections, such as gaps, outliers, and imbalanced data sets, are significant challenges in studies assessing sewer network failure risks [202]. While surveys of infrastructure focusing on vulnerable sewer mains are often carried out for risk management purposes, an ideal survey for modelling and predicting conditions would involve a representative sample from all types of assets [199].

In our study, K-means clustering was employed as an effective ML approach to delineate spatial patterns of GWI probability. K-means clustering can result in hyperspherical clusters of equal volume, potentially leading to unnecessary divisions of the actual data classes [167]. This algorithm assumes clusters are spherical and is highly sensitive to how the initial centroids are selected [203]. Beyond the selection of the cluster count, the structure of the data can significantly influence the performance of the clustering model [171]. Since this algorithm relies on Euclidean distance, it assumes the data follow a Gaussian distribution and is therefore highly sensitive to outliers [21,204].

K-means can become trapped in local minima, particularly if the data contain noise or if the centroids are not initialised properly [170]. Moreover, data set characteristics can benefit clustering algorithms like K-means and hierarchical clustering, which are more adaptable to varying cluster shapes and sizes [160]. Furthermore, complex, multidimensional data sets with nonlinear relationships may lack clearly defined, compact clusters [205]. K-means requires the user to specify the number of clusters in advance, which is often a subjective decision [167]. A larger number of clusters can frequently introduce more errors [171]. In our analysis, we selected three clusters to represent varying levels of GWI risk—high, intermediate, and low. Although our choice of three classes was based on practical management thresholds, alternative classifications may produce different spatial patterns.

4.4.5. Assessment of Model Agreement and the Influence of Clustering Methods

Another factor that could introduce uncertainty is the method used to compare the results of the two models. We employed Kappa statistics to assess agreement; however, Kappa is sensitive to class imbalance and may not always be symmetric. For instance, the value of P_o (observed agreement) in the formula can vary depending on the relative area of each class in the two maps [139], potentially leading to misleadingly low or high agreement scores when class distributions are skewed. Additionally, when comparing two models, it is necessary to first cluster the outputs of the GIS model. The choice of clustering method—such as quantile, equal interval, or even clustering the final GIS output using K-means—can substantially affect the spatial pattern and extent of predicted classes. These methodological choices, in turn, influence the resulting Kappa value and thus the interpretation of model agreement. As illustrated in Figure 10, different clustering schemes applied to the same GIS model output can yield markedly different spatial patterns and levels of overlap with the ML model predictions.

4.4.6. Hydrological Impacts of Urbanisation on Groundwater–Sewer Interactions

It is important to recognise that GWI in sewer systems represents just one potential pathway for groundwater in urban settings. Beyond infiltrating sewer networks, groundwater can interact with various urban processes. For example, urbanisation can enhance groundwater recharge, causing a rise in the water table, as noted by [206]. Additionally, sewer systems may experience exfiltration, which necessitates thorough assessment to ensure effective management, especially in ageing infrastructure [15].

5. Conclusions

-: The CR of 0.02 confirmed the reliability of the pairwise comparisons. Additionally, locations of storm overflow discharges generally aligned with areas of elevated GWI probability, indicating consistency between observed overflows and modelled infiltration probabilities.
-: The AHP identified five major contributors to GWI in sewers: GWD, proximity to rivers, flood potential, rock type, and alluvial deposits. However, sensitivity analysis revealed the importance of incorporating all 16 thematic layers, as excluding some led to greater discrepancies between individual-layer outputs and the final map generated through the AHP method.
-: The final results derived from the AHP-based GIS model indicated minimal seasonal variation in GWI probability scores, with winter exhibiting the highest values. Overall, a spatial trend was observed, with GWI probabilities gradually increasing from the southwest to the northeast across the study area.
-: By combining fuzzified thematic layers weighted through AHP with K-means clustering, we generated a spatial representation of the study area categorised into three GWI risk levels: high, medium, and low. Compared to maps produced using either equal weighting for all layers or only the five dominant factors, this approach yielded more cohesive cluster boundaries.
-: A comparison between the F-AHP-based K-means clustering results and the F-AHP-based GIS-derived outputs that incorporated all thematic layers revealed strong consistency between the two approaches, as evidenced by a Kappa coefficient of 0.70 and an 81.44% match in classification outcomes.

Overall, this research demonstrates the utility of the proposed F-AHP-driven integration of the GIS and K-means clustering in assessing GWI within sewer networks. The resulting framework offers utility managers a valuable tool for prioritising engineering interventions, aimed at promoting long-term asset sustainability. It provides GWI probability scores at a high spatial resolution (1 m × 1 m), along with categorisation into three risk levels across the entire study area—not limited to existing sewer lines, but also applicable to regions where sewer infrastructure may be developed in the future. The significance of this study lies in offering a transferable, data-driven methodology that enhances GWI estimation accuracy and supports proactive decision-making in sewer asset management, ultimately strengthening the resilience and sustainability of urban water infrastructure.

6. Recommendations

Given the outlined limitations and uncertainties, incorporating additional data sets is essential to improve the precision of predictive models. Strengthening data availability for GWI assessments in sewer systems is strongly encouraged—for instance, through the expanded deployment of high-frequency in situ sensors, as emphasised by [32]. The integration of such sensor-derived data can provide high-resolution temporal coverage to support model development, improve the detection of short-term infiltration dynamics, and enable more rigorous validation of predictive outputs. In addition, real-time sensor networks offer opportunities for early-warning systems and adaptive management of GWI events, thereby enhancing the operational resilience of sewer networks.

For the same study area, it is advisable to construct a physics-based model and evaluate its outcomes against the findings of this research. Furthermore, future investigations can extend the application of this integrated approach—merging an F-AHP-based GIS with ML—to other regions, not only to estimate GWI probability in sewer networks but also to categorise them into distinct risk levels. This combination of advanced methodologies holds significant potential for tackling complex urban infrastructure challenges more efficiently.

Further empirical studies are necessary to apply F-AHP within a wider range of hydrogeological settings, enhancing the method’s applicability and robustness under diverse geological and environmental scenarios [207]. However, such applications remain limited due to the inherent methodological challenges [208]. Significant opportunities remain for AI to transform urban water management practices [209].

Ongoing efforts should focus on advancing the integration of AI into GIS frameworks to unlock their full potential in shaping more intelligent and sustainable urban systems. Additionally, the development of hybrid ML models—incorporating multiple algorithms—offers promise for more accurate predictions in water resource management [151].

Ultimately, additional research is needed to incorporate social and economic dimensions, including community behaviours, public perceptions, and economic incentives [210]. The impact of physical and environmental factors on sewer pipe deterioration should also be further investigated [17]. Moreover, it is crucial to examine sustainable water resource governance frameworks and strategies for adapting to climate change [151].

Author Contributions

Conceptualization, N.Z., A.A.J., M.J., D.B. and J.L.W.; methodology, N.Z., A.A.J., M.J., D.B. and J.L.W.; software, N.Z.; validation, N.Z., A.A.J., M.J., D.B. and J.L.W.; formal analysis, N.Z., A.A.J., M.J., D.B. and J.L.W.; investigation, N.Z., A.A.J., M.J., D.B. and J.L.W.; resources, N.Z., A.A.J., M.J., D.B. and J.L.W.; data curation, N.Z., A.A.J., M.J., D.B. and J.L.W.; writing—original draft preparation, N.Z.; writing—review and editing, A.A.J., M.J., D.B. and J.L.W.; visualisation, N.Z.; supervision, A.A.J. and J.L.W.; project administration, N.Z., A.A.J., M.J., D.B. and J.L.W.; funding acquisition, N.Z., A.A.J. and J.L.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by South West Water, grant number 1510081. The APC was funded by the University of Exeter.

Data Availability Statement

The data that support the findings of this study may be available from the corresponding author upon reasonable request.

Acknowledgments

We would like to express our gratitude to the Centre for Resilience in Environment, Water and Waste (CREWW) and South West Water (SWW) for providing the data and financial support necessary to investigate groundwater infiltration within sewer networks. Special thanks go to Richard Brazier, Josie Butcher, and Simon Arthur for their invaluable support throughout the research. Additionally, we sincerely appreciate the British Geological Survey, the MetOffice, UK Centre for Ecology & Hydrology (UKCEH), and the Department for Environment, Food & Rural Affairs (DEFRA) for granting access to their valuable data sets, which were instrumental in completing this study.

Conflicts of Interest

Authors Mark Jacob and David Baldock were employed by the company South West Water. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Hughes, J.; Cowper-Heays, K.; Olesson, E.; Bell, R.; Stroombergen, A. Impacts and Implications of Climate Change on Wastewater Systems: A New Zealand Perspective. Clim. Risk Manag. 2021, 31, 100262. [Google Scholar] [CrossRef]
Jiang, Y.; Li, C.; Zhang, Y.; Zhao, R.; Yan, K.; Wang, W. Data-Driven Method Based on Deep Learning Algorithm for Detecting Fat, Oil, and Grease (FOG) of Sewer Networks in Urban Commercial Areas. Water Res. 2021, 207, 117797. [Google Scholar] [CrossRef] [PubMed]
Anand, U.; Li, X.; Sunita, K.; Lokhandwala, S.; Gautam, P.; Suresh, S.; Sarma, H.; Vellingiri, B.; Dey, A.; Bontempi, E.; et al. SARS-CoV-2 and Other Pathogens in Municipal Wastewater, Landfill Leachate, and Solid Waste: A Review about Virus Surveillance, Infectivity, and Inactivation. Environ. Res. 2022, 203, 111839. [Google Scholar] [CrossRef]
Liu, J.; Xiong, J.; Chen, Y.; Sun, H.; Zhao, X.; Tu, F.; Gu, Y. An Integrated Model Chain for Future Flood Risk Prediction under Land-Use Changes. J. Environ. Manag. 2023, 342, 118125. [Google Scholar] [CrossRef]
Mondal, K.; Bandyopadhyay, S.; Karmakar, S. Framework for Global Sensitivity Analysis in a Complex 1D-2D Coupled Hydrodynamic Model: Highlighting Its Importance on Flood Management over Large Data-Scarce Regions. J. Environ. Manag. 2023, 332, 117312. [Google Scholar] [CrossRef] [PubMed]
Hoseingholi, P.; Moeini, R. Pipe Failure Prediction of Wastewater Network Using Genetic Programming: Proposing Three Approaches. Ain Shams Eng. J. 2023, 14, 101958. [Google Scholar] [CrossRef]
Zeydalinejad, N.; Javadi, A.A.; Webber, J.L. Global Perspectives on Groundwater Infiltration to Sewer Networks: A Threat to Urban Sustainability. Water Res. 2024, 262, 122098. [Google Scholar] [CrossRef]
Ohlin Saletti, A.; Lindhe, A.; Söderqvist, T.; Rosén, L. Cost to Society from Infiltration and Inflow to Wastewater Systems. Water Res. 2023, 229, 119505. [Google Scholar] [CrossRef]
Saurav, K.C.; Shrestha, S.; Ninsawat, S.; Chonwattana, S. Predicting Flood Events in Kathmandu Metropolitan City under Climate Change and Urbanisation. J. Environ. Manag. 2021, 281, 111894. [Google Scholar] [CrossRef]
Nguyen, L.V. Integrating Machine Learning and GIS for Sewer Condition Assessment and Visualization. Ph.D. Thesis, Norwegian University of Science and Technology, Trondheim, Norway, 2024. Available online: https://ntnuopen.ntnu.no/ntnu-xmlui/handle/11250/3150381 (accessed on 20 July 2025).
Saddiqi, M.M.; Zhao, W.; Cotterill, S.; Dereli, R.K. Smart Management of Combined Sewer Overflows: From an Ancient Technology to Artificial Intelligence. Wiley Interdiscip. Rev. Water 2023, 10, e1635. [Google Scholar] [CrossRef]
Fu, G.; Jin, Y.; Sun, S.; Yuan, Z.; Butler, D. The Role of Deep Learning in Urban Water Management: A Critical Review. Water Res. 2022, 223, 118973. [Google Scholar] [CrossRef]
Royal Society Working Group. Machine Learning: The Power and Promise of Computers That Learn by Example. In Technical Report; The Royal Society: London, UK, 2017. [Google Scholar] [CrossRef]
Cheng, J.C.P.; Wang, M. Automated Detection of Sewer Pipe Defects in Closed-Circuit Television Images Using Deep Learning Techniques. Autom. Constr. 2018, 95, 155–171. [Google Scholar] [CrossRef]
Ma, S.; Elshaboury, N.; Ali, E.; Zayed, T. Proactive Exfiltration Severity Management in Sewer Networks: A Hyperparameter Optimization for Two-Tiered Machine Learning Prediction. Tunn. Undergr. Space Technol. 2024, 144, 105532. [Google Scholar] [CrossRef]
Wang, M.; Luo, H.; Cheng, J.C.P. Towards an Automated Condition Assessment Framework of Underground Sewer Pipes Based on Closed-Circuit Television (CCTV) Images. Tunn. Undergr. Space Technol. 2021, 110, 103840. [Google Scholar] [CrossRef]
Mohammadi, M.M.; Najafi, M.; Kaushal, V.; Serajiantehrani, R.; Salehabadi, N.; Ashoori, T. Sewer Pipes Condition Prediction Models: A State-of-the-Art Review. Infrastructures 2019, 4, 64. [Google Scholar] [CrossRef]
Kwon, S.H.; Kim, J.H. Machine Learning and Urban Drainage Systems: State-of-the-Art Review. Water 2021, 13, 3545. [Google Scholar] [CrossRef]
Längkvist, M.; Karlsson, L.; Loutfi, A. A Review of Unsupervised Feature Learning and Deep Learning for Time-Series Modeling. Pattern Recognit. Lett. 2014, 42, 11–24. [Google Scholar] [CrossRef]
Abdideh, M.; Ameri, A. Cluster Analysis of Petrophysical and Geological Parameters for Separating the Electrofacies of a Gas Carbonate Reservoir Sequence. Nat. Resour. Res. 2020, 29, 1843–1856. [Google Scholar] [CrossRef]
Szabó, N.P.; Braun, B.A.; Abdelrahman, M.M.G.; Dobróka, M. Improved Well Logs Clustering Algorithm for Shale Gas Identification and Formation Evaluation. Acta Geod. Geophys. 2021, 56, 711–729. [Google Scholar] [CrossRef]
Mohammed, M.A.A.; Szabó, N.P.; Flores, Y.G.; Szűcs, P. Multi-Well Clustering and Inverse Modeling-Based Approaches for Exploring Geometry, Petrophysical, and Hydrogeological Parameters of the Quaternary Aquifer System around Debrecen Area, Hungary. Groundw. Sustain. Dev. 2024, 24, 101086. [Google Scholar] [CrossRef]
Jain, A.K. Data Clustering: 50 Years beyond K-Means. Pattern Recognit. Lett. 2010, 31, 651–666. [Google Scholar] [CrossRef]
Eid, M.H.; Eissa, M.; Mohamed, E.A.; Ramadan, H.S.; Czuppon, G.; Kovács, A.; Szűcs, P. Application of Stable Isotopes, Mixing Models, and K-Means Cluster Analysis to Detect Recharge and Salinity Origins in Siwa Oasis, Egypt. Groundw. Sustain. Dev. 2024, 25, 101124. [Google Scholar] [CrossRef]
Subbarayan, S.; Thiyagarajan, S.; Gangolu, S.; Devanantham, A.; Nagireddy Masthan, R. Assessment of Groundwater Vulnerable Zones Using Conventional and Fuzzy-AHP DRASTIC for Visakhapatnam District, India. Groundw. Sustain. Dev. 2024, 24, 101054. [Google Scholar] [CrossRef]
Das, B.; Pal, S.C.; Malik, S.; Chakrabortty, R. Modeling Groundwater Potential Zones of Puruliya District, West Bengal, India Using Remote Sensing and GIS Techniques. Geol. Ecol. Landsc. 2019, 3, 223–237. [Google Scholar] [CrossRef]
Soyaslan, İ.İ. Assessment of Groundwater Vulnerability Using Modified DRASTIC-Analytical Hierarchy Process Model in Bucak Basin, Turkey. Arab. J. Geosci. 2020, 13, 1127. [Google Scholar] [CrossRef]
Saaty, T.L. How to Make a Decision: The Analytic Hierarchy Process. Eur. J. Oper. Res. 1990, 48, 9–26. [Google Scholar] [CrossRef]
Zadeh, L.A. Fuzzy Sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef]
Coffey, L.; Claudio, D. In Defense of Group Fuzzy AHP: A Comparison of Group Fuzzy AHP and Group AHP with Confidence Intervals. Expert. Syst. Appl. 2021, 178, 114970. [Google Scholar] [CrossRef]
Emrouznejad, A.; Marra, M. The State of the Art Development of AHP (1979–2017): A Literature Review with a Social Network Analysis. Int. J. Prod. Res. 2017, 55, 6653–6675. [Google Scholar] [CrossRef]
Lagogiannis, S.; Papadopoulos, A.; Dimitriou, E. Development of an Automatic Water Monitoring Network by Using Multi-Criteria Analysis and a GIS-Based Fuzzy Process. Environ. Process. 2024, 11, 36. [Google Scholar] [CrossRef]
Liu, Y.; Eckert, C.M.; Earl, C. A Review of Fuzzy AHP Methods for Decision-Making with Subjective Judgements. Expert. Syst. Appl. 2020, 161, 113738. [Google Scholar] [CrossRef]
Saltelli, A. Sensitivity Analysis for Importance Assessment. Risk Anal. 2002, 22, 579–590. [Google Scholar] [CrossRef]
Goodarzi, M.R.; Vazirian, M. A Machine Learning Approach for Predicting and Localizing the Failure and Damage Point in Sewer Networks Due to Pipe Properties. J. Water Health 2024, 22, 487–509. [Google Scholar] [CrossRef]
Kizilöz, B. Prediction of Failures in Sewer Networks Using Various Machine Learning Classifiers. Urban Water J. 2024, 21, 877–893. [Google Scholar] [CrossRef]
Pokharel, A. Application of Supervised Machine Learning Algorithms for Developing Service Life Prediction Model of Sewer Pipes. Ph.D. Thesis, The University of Texas at Arlington, Arlington, TX, USA, 2021. [Google Scholar]
Seng, V. Enhancing Sewer Asset Management Using Machine Learning Algorithms. Ph.D. Thesis, University of British Columbia, Vancouver, BC, Canada, 2024. [Google Scholar] [CrossRef]
Sousa, V.; Matos, J.P.; Matias, N. Evaluation of Artificial Intelligence Tool Performance and Uncertainty for Predicting Sewer Structural Condition. Autom. Constr. 2014, 44, 84–91. [Google Scholar] [CrossRef]
Zhang, D.; Martinez, N.; Lindholm, G.; Ratnaweera, H. Manage Sewer In-Line Storage Control Using Hydraulic Model and Recurrent Neural Network. Water Resour. Manag. 2018, 32, 2079–2098. [Google Scholar] [CrossRef]
Zhang, D.; Hølland, E.S.; Lindholm, G.; Ratnaweera, H. Hydraulic Modeling and Deep Learning Based Flow Forecasting for Optimizing Inter Catchment Wastewater Transfer. J. Hydrol. 2018, 567, 792–802. [Google Scholar] [CrossRef]
Ki, S.J.; Lee, C.S.; Jung, W.H.; Park, H.G. Comparison of Classification and Supervised Learning Algorithms in Assessing the Hydraulic Conditions of Sewer Collection Systems: A Case Study of Local Sewer Networks in Jinju City, Korea. Desalination Water Treat. 2018, 124, 202–210. [Google Scholar] [CrossRef]
Wong, L.S.; Marani, A.; Nehdi, M.L. Gradient Boosting Coupled with Oversampling Model for Prediction of Concrete Pipe-Joint Infiltration Using Designwise Data Set. J. Pipeline Syst. Eng. Pract. 2021, 12, 04021015. [Google Scholar] [CrossRef]
Qiu, C.; Shao, G.; Zhang, Z.; Zhou, C.; Hou, Y.; Zhao, E.; Guo, X.; Guan, X. Unsupervised Real Time and Early Anomalies Detection Method for Sewer Networks Systems. IEEE Access 2024, 12, 21698–21709. [Google Scholar] [CrossRef]
Jato-Espino, D.; Sillanpää, N.; Pathak, S. Flood Modelling in Sewer Networks Using Dependence Measures and Learning Classifier Systems. J. Hydrol. 2019, 578, 124013. [Google Scholar] [CrossRef]
Zhang, Z.; Laakso, T.; Wang, Z.; Pulkkinen, S.; Ahopelto, S.; Virrantaus, K.; Li, Y.; Cai, X.; Zhang, C.; Vahala, R.; et al. Comparative Study of AI-Based Methods—Application of Analyzing Inflow and Infiltration in Sanitary Sewer Subcatchments. Sustainability 2020, 12, 6254. [Google Scholar] [CrossRef]
Liu, T.; Ramirez-Marquez, J.E.; Jagupilla, S.C.; Prigiobbe, V. Combining a Statistical Model with Machine Learning to Predict Groundwater Flooding (or Infiltration) into Sewer Networks. J. Hydrol. 2021, 603, 126916. [Google Scholar] [CrossRef]
Abebe, Y.; Tesfamariam, S. Underground Sewer Networks Renewal Complexity Assessment and Trenchless Technology: A Bayesian Belief Network and GIS Framework. J. Pipeline Syst. Eng. Pract. 2020, 11, 04019058. [Google Scholar] [CrossRef]
Ghavami, S.M.; Borzooei, Z.; Maleki, J. An Effective Approach for Assessing Risk of Failure in Urban Sewer Pipelines Using a Combination of GIS and AHP-DEA. Process Saf. Environ. Prot. 2020, 133, 275–285. [Google Scholar] [CrossRef]
Van Nguyen, L.; Seidu, R. Application of Regression-Based Machine Learning Algorithms in Sewer Condition Assessment for Ålesund City, Norway. Water 2022, 14, 3993. [Google Scholar] [CrossRef]
Van Nguyen, L.; Bui, D.T.; Seidu, R. Comparison of Machine Learning Techniques for Condition Assessment of Sewer Network. IEEE Access 2022, 10, 124238–124258. [Google Scholar] [CrossRef]
Roghani, B.; Tabesh, M.; Cherqui, F. A Fuzzy Multidimensional Risk Assessment Method for Sewer Asset Management. Int. J. Civ. Eng. 2024, 22, 1–17. [Google Scholar] [CrossRef]
Kazuva, E.; Zhang, J.; Tong, Z.; Liu, X.P.; Memon, S.; Mhache, E. GIS- and MCD-Based Suitability Assessment for Optimized Location of Solid Waste Landfills in Dar Es Salaam, Tanzania. Environ. Sci. Pollut. Res. 2021, 28, 11259–11278. [Google Scholar] [CrossRef]
Martin, C.; Kamara, O.; Berzosa, I.; Badiola, J.L. Smart GIS Platform That Facilitates the Digitalization of the Integrated Urban Drainage System. Environ. Model. Softw. 2020, 123, 104568. [Google Scholar] [CrossRef]
Nigusse, A.G.M.; Adhaneom, U.G.; Kahsay, G.H.; Abrha, A.M.; Gebre, D.N.; Weldearegay, A.G. GIS Application for Urban Domestic Wastewater Treatment Site Selection in the Northern Ethiopia, Tigray Regional State: A Case Study in Mekelle City. Arab. J. Geosci. 2020, 13, 311. [Google Scholar] [CrossRef]
de Oliveira Silva, M.C.; Vasconcelos, R.S.; Cirilo, J.A. Risk Mapping of Water Supply and Sanitary Sewage Systems in a City in the Brazilian Semi-Arid Region Using GIS-MCDA. Water 2022, 14, 3251. [Google Scholar] [CrossRef]
Wu, Z.; Abdul-Nour, G. Comparison of Multi-Criteria Group Decision-Making Methods for Urban Sewer Network Plan Selection. CivilEng 2020, 1, 26–48. [Google Scholar] [CrossRef]
Lameche, E.K.; Boutaghane, H.; Saber, M.; Abdrabo, K.I.; Bermad, A.M.; Djeddou, M.; Boulmaiz, T.; Kantoush, S.A.; Sumi, T. Urban Flood Numerical Modeling and Hydraulic Performance of a Drainage Network: A Case Study in Algiers, Algeria. Water Sci. Technol. 2023, 88, 1635–1656. [Google Scholar] [CrossRef] [PubMed]
Abd-Elaty, I.; Negm, A.; Hamdan, A.M.; Nour-Eldeen, A.S.; Zeleňáková, M.; Hossen, H. Assessing the Hazards of Groundwater Logging in Tourism Aswan City, Egypt. Water 2022, 14, 1233. [Google Scholar] [CrossRef]
Attwa, M.; Zamzam, S. An Integrated Approach of GIS and Geoelectrical Techniques for Wastewater Leakage Investigations: Active Constraint Balancing and Genetic Algorithms Application. J. Appl. Geophys. 2020, 175, 103992. [Google Scholar] [CrossRef]
Rojas-Gómez, K.L.; Binder, M.; Walther, M.; Engelmann, C. A Parsimonious Approach to Predict Regions Affected by Sewer-Borne Contaminants in Urban Aquifers. Environ. Monit. Assess. 2023, 195, 1517. [Google Scholar] [CrossRef]
Williams, V.A. Evaluating the Potential of a Geospatial/Geostatistical Methodology for Locating Rain-Derived Infiltration and Inflow into Wastewater Treatment Systems in the Minneapolis/St. Paul Metropolitan Area, Minnesota, USA. Master’s Thesis, Minnesota State University, Mankato, MN, USA, 2017. [Google Scholar]
Thapa, J.B.; Jung, J.K.; Yovichin, R.D. A Qualitative Approach to Determine the Areas of Highest Inflow and Infiltration in Underground Infrastructure for Urban Area. Adv. Civ. Eng. 2019, 2019, 2620459. [Google Scholar] [CrossRef]
Rossi, R.J.; Toran, L. Exploring the Potential for Groundwater Inundation in Coastal US Cities Due to Interactions between Sewer Infrastructure and Global Change. Environ. Earth Sci. 2019, 78, 258. [Google Scholar] [CrossRef]
Zeydalinejad, N.; Javadi, A.A.; Baldock, D.; Webber, J.L. An Integrated Hydrological-Hydrogeological Model for Analysing Spatio-Temporal Probability of Groundwater Infiltration in Urban Infrastructure. Sustain. Cities Soc. 2024, 116, 105891. [Google Scholar] [CrossRef]
Sangsefidi, Y.; Bagheri, K.; Davani, H.; Merrifield, M. Data Analysis and Integrated Modeling of Compound Flooding Impacts on Coastal Drainage Infrastructure under a Changing Climate. J. Hydrol. 2023, 616, 128823. [Google Scholar] [CrossRef]
Sangsefidi, Y.; Barnes, A.; Merrifield, M.; Davani, H. Data-Driven Analysis and Integrated Modeling of Climate Change Impacts on Coastal Groundwater and Sanitary Sewer Infrastructure. Sustain. Cities Soc. 2023, 99, 104914. [Google Scholar] [CrossRef]
Halfawy, M.R.; Dridi, L.; Baker, S. Integrated Decision Support System for Optimal Renewal Planning of Sewer Networks. J. Comput. Civ. Eng. 2008, 22, 360–372. [Google Scholar] [CrossRef]
Nguyen, L.V.; Razak, S. Predicting Sewer Structural Condition Using Hybrid Machine Learning Algorithms. Urban Water J. 2023, 20, 882–896. [Google Scholar] [CrossRef]
Abebe, Y.; Tesfamariam, S. Storm Sewer Pipe Renewal Planning Considering Deterioration, Climate Change, and Urbanization: A Dynamic Bayesian Network and GIS Framework. Sustain. Resilient Infrastruct. 2023, 8, 70–85. [Google Scholar] [CrossRef]
Saranya, A.; Al Mazroa, A.; Maashi, M.; Nithya, T.M.; Priya, V. Remote Sensing and Machine Learning Approach for Zoning of Wastewater Drainage System. Desalination Water Treat. 2024, 319, 100549. [Google Scholar] [CrossRef]
Nourani, V.; Ghaneei, P.; Kantoush, S.A. Robust Clustering for Assessing the Spatiotemporal Variability of Groundwater Quantity and Quality. J. Hydrol. 2022, 604, 127272. [Google Scholar] [CrossRef]
Saltelli, A.; Aleksankina, K.; Becker, W.; Fennell, P.; Ferretti, F.; Holst, N.; Li, S.; Wu, Q. Why so Many Published Sensitivity Analyses Are False: A Systematic Review of Sensitivity Analysis Practices. Environ. Model. Softw. 2019, 114, 29–39. [Google Scholar] [CrossRef]
DEFRA. LIDAR Composite Digital Terrain Model (DTM)—1m. Department for Environment, Food and Rural Affairs. 2022. Available online: https://environment.data.gov.uk/dataset/13787b9a-26a4-4775-8523-806d13af58fc (accessed on 20 July 2025).
OS Open Rivers. Ordnance Survey. 2025. Available online: https://www.data.gov.uk/dataset/dc29160b-b163-4c6e-8817-f313229bcc23/os-open-rivers1 (accessed on 20 July 2025).
DEFRA. England|Catchment Data Explorer. Department for Environment, Food and Rural Affairs. 2025. Available online: https://environment.data.gov.uk/catchment-planning/ (accessed on 20 July 2025).
Booth, K.A.; Linley, K.A. Geological Indicators of Flooding: User Guidance Notes; British Geological Survey: Nottingham, UK, 2010. [Google Scholar]
Morton, R.D.; Marston, C.G.; O’Neil, A.W.; Rowland, C.S. Land Cover Map 2023 (land parcels, GB). NERC EDS Environmental Information Data Centre. 2024. Available online: https://catalogue.ceh.ac.uk/documents/50b344eb-8343-423b-8b2f-0e9800e34bbd (accessed on 20 July 2025).
Lawley, R. User Guide: Soil Parent Material Dataset. In British Geological Survey Open Report; British Geological Survey: Nottingham, UK, 2021. [Google Scholar]
Smith, A. Digital geological map of Great Britain, information notes, 2013. In British Geological Survey Open Report; British Geological Survey: Nottingham, UK, 2013. [Google Scholar]
DEFRA. Hydrology Data Explorer—Explore. Department for Environment, Food and Rural Affairs. 2025. Available online: https://environment.data.gov.uk/hydrology/explore (accessed on 20 July 2025).
Hollis, D.; McCarthy, M.; Kendon, M.; Legg, T.; Simpson, I. HadUK-Grid—A New UK Dataset of Gridded Climate Observations. Geosci. Data J. 2019, 6, 151–159. [Google Scholar] [CrossRef]
Muhammad, A.M.; Zhonghua, T.; Sissou, Z.; Mohamadi, B.; Ehsan, M. Analysis of geological structure and anthropological factors affecting arsenic distribution in the Lahore aquifer, Pakistan. Hydrogeol. J. 2016, 24, 1891–1904. [Google Scholar] [CrossRef]
Ahmed, R.; Sajjad, H. Analyzing Factors of Groundwater Potential and Its Relation with Population in the Lower Barpani Watershed, Assam, India. Nat. Resour. Res. 2018, 27, 503–515. [Google Scholar] [CrossRef]
Ahmad, I.; Dar, M.A.; Fenta, A.; Halefom, A.; Nega, H.; Andualem, T.G.; Teshome, A. Spatial Configuration of Groundwater Potential Zones Using OLS Regression Method. J. Afr. Earth Sci. 2021, 177, 104147. [Google Scholar] [CrossRef]
Doke, A.B.; Zolekar, R.B.; Patel, H.; Das, S. Geospatial Mapping of Groundwater Potential Zones Using Multi-Criteria Decision-Making AHP Approach in a Hardrock Basaltic Terrain in India. Ecol. Indic. 2021, 127, 107685. [Google Scholar] [CrossRef]
Su, X.; Liu, T.; Beheshti, M.; Prigiobbe, V. Relationship between Infiltration, Sewer Rehabilitation, and Groundwater Flooding in Coastal Urban Areas. Environ. Sci. Pollut. Res. 2020, 27, 14288–14298. [Google Scholar] [CrossRef] [PubMed]
Ogden, F.L.; Raj Pradhan, N.; Downer, C.W.; Zahner, J.A. Relative Importance of Impervious Area, Drainage Density, Width Function, and Subsurface Storm Drainage on Flood Runoff from an Urbanized Catchment. Water Resour. Res. 2011, 47, 12503. [Google Scholar] [CrossRef]
Julínek, T.; Duchan, D.; Říha, J. Mapping of Uplift Hazard Due to Rising Groundwater Level during Floods. J. Flood Risk Manag. 2020, 13, e12601. [Google Scholar] [CrossRef]
Ghorbani Nejad, S.; Falah, F.; Daneshfar, M.; Haghizadeh, A.; Rahmati, O. Delineation of Groundwater Potential Zones Using Remote Sensing and GIS-Based Data-Driven Models. Geocarto Int. 2017, 32, 167–187. [Google Scholar] [CrossRef]
Thapa, R.; Gupta, S.; Guin, S.; Kaur, H. Assessment of Groundwater Potential Zones Using Multi-Influencing Factor (MIF) and GIS: A Case Study from Birbhum District, West Bengal. Appl. Water Sci. 2017, 7, 4117–4131. [Google Scholar] [CrossRef]
Oikonomidis, D.; Dimogianni, S.; Kazakis, N.; Voudouris, K. A GIS/Remote Sensing-Based Methodology for Groundwater Potentiality Assessment in Tirnavos Area, Greece. J. Hydrol. 2015, 525, 197–208. [Google Scholar] [CrossRef]
Hamill, L.; Bell, F.G. Groundwater Resource Development; Butterworths: London, UK, 1986. [Google Scholar]
Moubark, K.; Abdelkareem, M. Characterization and Assessment of Groundwater Resources Using Hydrogeochemical Analysis, GIS, and Field Data in Southern Wadi Qena, Egypt. Arab. J. Geosci. 2018, 11, 598. [Google Scholar] [CrossRef]
D’Aniello, A.; Cimorelli, L.; Cozzolino, L.; Pianese, D. The Effect of Geological Heterogeneity and Groundwater Table Depth on the Hydraulic Performance of Stormwater Infiltration Facilities. Water Resour. Manag. 2019, 33, 1147–1166. [Google Scholar] [CrossRef]
Pophillat, W.; Sage, J.; Rodriguez, F.; Braud, I. Consequences of Interactions between Stormwater Infiltration Systems, Shallow Groundwater and Underground Structures at the Neighborhood Scale. Urban Water J. 2022, 19, 812–823. [Google Scholar] [CrossRef]
Singh, S.K.; Zeddies, M.; Shankar, U.; Griffiths, G.A. Potential Groundwater Recharge Zones within New Zealand. Geosci. Front. 2019, 10, 1065–1072. [Google Scholar] [CrossRef]
Nuissl, H.; Haase, D.; Lanzendorf, M.; Wittmer, H. Environmental Impact Assessment of Urban Land Use Transitions—A Context-Sensitive Approach. Land. Use Policy 2009, 26, 414–424. [Google Scholar] [CrossRef]
Kourgialas, N.N.; Karatzas, G.P. Flood management and a GIS modelling method to assess flood-hazard areas—A case study. Hydrol. Sci. J. 2011, 56, 212–225. [Google Scholar] [CrossRef]
Tehrany, M.S.; Pradhan, B.; Jebur, M.N. Flood Susceptibility Mapping Using a Novel Ensemble Weights-of-Evidence and Support Vector Machine Models in GIS. J. Hydrol. 2014, 512, 332–343. [Google Scholar] [CrossRef]
McGreal, W.S.; Craig, D. Mass-movement activity: An illustration of differing responses to groundwater conditions from two sites in northern Ireland. Ir. Geogr. 1977, 10, 28–35. [Google Scholar] [CrossRef]
Haridas, V.R.; Aravindan, S.; Girish, G. Remote sensing and its applications for groundwater favourable area identification. Q. J. GARC 1998, 6, 18–22. [Google Scholar]
Pinto, D.; Shrestha, S.; Babel, M.S.; Ninsawat, S. Delineation of Groundwater Potential Zones in the Comoro Watershed, Timor Leste Using GIS, Remote Sensing and Analytic Hierarchy Process (AHP) Technique. Appl. Water Sci. 2017, 7, 503–519. [Google Scholar] [CrossRef]
Mohammadi, Z.; Alijani, F.; Rangzan, K. Deflogic: A Method for Assessment of Groundwater Potential in Karst Terrains: Gurpi Anticline, Southwest Iran. Arab. J. Geosci. 2014, 7, 3639–3655. [Google Scholar] [CrossRef]
Nassery, H.R.; Zeydalinejad, N.; Alijani, F. Speculation on the Resilience of Karst Aquifers Using Geophysical and GIS-Based Approaches (a Case Study of Iran). Acta Geophys. 2021, 69, 2393–2415. [Google Scholar] [CrossRef]
Zhang, K.; Parolari, A.J. Impact of Stormwater Infiltration on Rainfall-Derived Inflow and Infiltration: A Physically Based Surface–Subsurface Urban Hydrologic Model. J. Hydrol. 2022, 610, 127938. [Google Scholar] [CrossRef]
Saaty, T.L. The Analytic Network Process: Planning, Priority Setting, Resource Allocation; McGraw Hill International Publication: New York, NY, USA, 1980. [Google Scholar]
Saaty, T.L. The analytic hierarchy process (AHP). J. Oper. Res. Soc. 1980, 41, 1073–1076. [Google Scholar]
Saaty, T.L. Decision Making—The Analytic Hierarchy and Network Processes (AHP/ANP). J. Syst. Sci. Syst. Eng. 2004, 13, 1–35. [Google Scholar] [CrossRef]
Saaty, T.L. Some Mathematical Topics in the Analytic Hierarchy Process. Math. Models Decis. Support. 1988, 48, 89–107. [Google Scholar] [CrossRef]
Shekar, P.R.; Mathew, A. Integrated Assessment of Groundwater Potential Zones and Artificial Recharge Sites Using GIS and Fuzzy-AHP: A Case Study in Peddavagu Watershed, India. Environ. Monit. Assess. 2023, 195, 906. [Google Scholar] [CrossRef] [PubMed]
Saaty, T.L. About a Hundred Years of Creativity in Decision Making. Int. J. Anal. Hierarchy Process 2015, 7, 138–144. [Google Scholar] [CrossRef]
Shirzadi, A.; Bui, D.T.; Pham, B.T.; Solaimani, K.; Chapi, K.; Kavian, A.; Shahabi, H.; Revhaug, I. Shallow Landslide Susceptibility Assessment Using a Novel Hybrid Intelligence Approach. Environ. Earth Sci. 2017, 76, 60. [Google Scholar] [CrossRef]
Shekhar, S.; Pandey, A.C. Delineation of Groundwater Potential Zone in Hard Rock Terrain of India Using Remote Sensing, Geographical Information System (GIS) and Analytic Hierarchy Process (AHP) Techniques. Geocarto Int. 2015, 30, 402–421. [Google Scholar] [CrossRef]
Farhat, B.; Souissi, D.; Mahfoudhi, R.; Chrigui, R.; Sebei, A.; Ben Mammou, A. GIS-Based Multi-Criteria Decision-Making Techniques and Analytical Hierarchical Process for Delineation of Groundwater Potential. Environ. Monit. Assess. 2023, 195, 285. [Google Scholar] [CrossRef]
Saaty, T.L.; Vargas, L.G. Prediction, Projection and Forecasting: Applications of the Analytic Hierarchy Process in Economics, Finance, Politics, Games and Sports; Springer: Dordrecht, The Netherlands, 1991. [Google Scholar]
Bonham-Carter, G. Geographic Information Systems for Geoscientists: Modelling with GIS; Elsevier: Amsterdam, The Netherlands, 1994. [Google Scholar]
Likas, A.; Vlassis, N.; Verbeek, J.J. The Global K-Means Clustering Algorithm. Pattern Recognit. 2003, 36, 451–461. [Google Scholar] [CrossRef]
Žalik, K.R. An Efficient K-Means Clustering Algorithm. Pattern Recognit. Lett. 2008, 29, 1385–1391. [Google Scholar] [CrossRef]
Han, J.; Kamber, M. Data Mining, Concepts and Techniques; Morgan kaufmann Publishers: San Francisco, CA, USA, 2006. [Google Scholar]
Ali, A.; Sheng-Chang, C. Characterization of Well Logs Using K-Mean Cluster Analysis. J. Pet. Explor. Prod. Technol. 2020, 10, 2245–2256. [Google Scholar] [CrossRef]
Gómez, F.; Flores, Y.; Vadászi, M. Comparative Analysis of the K-Nearest-Neighbour Method and K-Means Cluster Analysis for Lithological Interpretation of Well Logs of the Shushufindi Oilfield, Ecuador. Rud.-Geološko-Naft. Zb. 2022, 37, 155–165. [Google Scholar] [CrossRef]
Kuo, R.J.; Wang, H.S.; Hu, T.L.; Chou, S.H. Application of Ant K-Means on Clustering Analysis. Comput. Math. Appl. 2005, 50, 1709–1724. [Google Scholar] [CrossRef]
Javadi, S.; Hashemy, S.M.; Mohammadi, K.; Howard, K.W.F.; Neshat, A. Classification of Aquifer Vulnerability Using K-Means Cluster Analysis. J. Hydrol. 2017, 549, 27–37. [Google Scholar] [CrossRef]
Hartigan, J.A.; Wong, M.A. Algorithm AS 136: A K-Means Clustering Algorithm. Appl. Stat. 1979, 28, 100. [Google Scholar] [CrossRef]
MacQueen, J.B. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics; University of California Press: Oakland, CA, USA, 1967; Volume 5, pp. 281–298. [Google Scholar]
Saaty, T.L. Decision Making with Dependence and Feedback: The Analytic Network Process; RWS Publication: Pittsburgh, PA, USA, 1997; Volume 4922. [Google Scholar]
Mohandes, S.R.; Kineber, A.F.; Abdelkhalek, S.; Kaddoura, K.; Elsayed, M.; Hosseini, M.R.; Zayed, T. Evaluation of the Critical Factors Causing Sewer Overflows through Modeling of Structural Equations and System Dynamics. J. Clean. Prod. 2022, 375, 134035. [Google Scholar] [CrossRef]
Owolabi, T.A.; Mohandes, S.R.; Zayed, T. Investigating the Impact of Sewer Overflow on the Environment: A Comprehensive Literature Review Paper. J. Environ. Manag. 2022, 301, 113810. [Google Scholar] [CrossRef]
Muttil, N.; Nasrin, T.; Sharma, A.K. Impacts of extreme rainfalls on sewer overflows and WSUD-based mitigation strategies: A review. Water 2023, 15, 429. [Google Scholar] [CrossRef]
Cohen, J. A Coefficient of Agreement for Nominal Scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
Bennett, N.D.; Croke, B.F.W.; Guariso, G.; Guillaume, J.H.A.; Hamilton, S.H.; Jakeman, A.J.; Marsili-Libelli, S.; Newham, L.T.H.; Norton, J.P.; Perrin, C.; et al. Characterising Performance of Environmental Models. Environ. Model. Softw. 2013, 40, 1–20. [Google Scholar] [CrossRef]
Rose, K.A.; Roth, B.M.; Smith, E.P. Skill Assessment of Spatial Maps for Oceanographic Modeling. J. Mar. Syst. 2009, 76, 34–48. [Google Scholar] [CrossRef]
Sciuto, G.; Diekkruüger, B. Influence of Soil Heterogeneity and Spatial Discretization on Catchment Water Balance Modeling. Vadose Zone J. 2010, 9, 955–969. [Google Scholar] [CrossRef]
van Vliet, J.; Hagen-Zanker, A.; Hurkens, J.; van Delden, H. A Fuzzy Set Approach to Assess the Predictive Accuracy of Land Use Simulations. Ecol. Model. 2013, 261–262, 32–42. [Google Scholar] [CrossRef]
Hagen-Zanker, A.; Martens, P. Map Comparison Methods for Comprehensive Assessment of Geosimulation Models. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2008; Volume 5072, pp. 194–209. [Google Scholar] [CrossRef]
Pontius, R.G.; Huffaker, D.; Denman, K. Useful Techniques of Validation for Spatially Explicit Land-Change Models. Ecol. Model. 2004, 179, 445–461. [Google Scholar] [CrossRef]
Power, C.; Simms, A.; White, R. Hierarchical Fuzzy Pattern Matching for the Regional Comparison of Land Use Maps. Int. J. Geogr. Inf. Sci. 2001, 15, 77–100. [Google Scholar] [CrossRef]
Sorichetta, A.; Masetti, M.; Ballabio, C.; Sterlacchini, S.; Beretta, G. Pietro Reliability of Groundwater Vulnerability Maps Obtained through Statistical Methods. J. Environ. Manag. 2011, 92, 1215–1224. [Google Scholar] [CrossRef]
Koch, J.; Jensen, K.H.; Stisen, S. Toward a True Spatial Model Evaluation in Distributed Hydrological Modeling: Kappa Statistics, Fuzzy Theory, and EOF-Analysis Benchmarked by the Human Perception and Evaluated against a Modeling Case Study. Water Resour. Res. 2015, 51, 1225–1246. [Google Scholar] [CrossRef]
Flight, L.; Julious, S.A. The Disagreeable Behaviour of the Kappa Statistic. Pharm. Stat. 2015, 14, 74–78. [Google Scholar] [CrossRef]
Razavi, S.; Jakeman, A.; Saltelli, A.; Prieur, C.; Iooss, B.; Borgonovo, E.; Plischke, E.; Lo Piano, S.; Iwanaga, T.; Becker, W.; et al. The Future of Sensitivity Analysis: An Essential Discipline for Systems Modeling and Policy Support. Environ. Model. Softw. 2021, 137, 104954. [Google Scholar] [CrossRef]
Budamala, V.; Baburao Mahindrakar, A. Integration of Adaptive Emulators and Sensitivity Analysis for Enhancement of Complex Hydrological Models. Environ. Process. 2020, 7, 1235–1253. [Google Scholar] [CrossRef]
Mohammadi, M.M. Development of Condition Prediction Models for Sanitary Sewer Pipes. In Civil Engineering Dissertations; University of Texas at Arlington: Arlington, TX, USA, 2019. [Google Scholar]
Abdallah, M.; Abu Talib, M.; Feroz, S.; Nasir, Q.; Abdalla, H.; Mahfood, B. Artificial Intelligence Applications in Solid Waste Management: A Systematic Research Review. Waste Manag. 2020, 109, 231–246. [Google Scholar] [CrossRef] [PubMed]
Youssef, A.M.; Pradhan, B.; Tarabees, E. Integrated Evaluation of Urban Development Suitability Based on Remote Sensing and GIS Techniques: Contribution from the Analytic Hierarchy Process. Arab. J. Geosci. 2011, 4, 463–473. [Google Scholar] [CrossRef]
Minh, H.V.T.; Avtar, R.; Kumar, P.; Tran, D.Q.; Van Ty, T.; Behera, H.C.; Kurasaki, M. Groundwater Quality Assessment Using Fuzzy-AHP in An Giang Province of Vietnam. Geosciences 2019, 9, 330. [Google Scholar] [CrossRef]
Ridwan Makkulawu, A.; Santoso, I.; Asmaul Mustaniroh, S. Exploring the Potential and Benefits of AHP and GIS Integration for Informed Decision-Making: A Literature Review. Ing. Des. Syst. D’information 2023, 28, 1701. [Google Scholar] [CrossRef]
Kalogirou, S.A. Use of genetic algorithms for the optimal design of flat plate solar collectors. In Proceedings of the ISES Solar World Congress 2003; Solar Energy for a Sustainable Future, Göteborg, Sweden, 14–19 June 2003; International Solar Energy Society: Freiburg, Germany, 2003. [Google Scholar]
Turner, A.; Retamal, M.; White, S.; Palfreeman, L. Third Party Evaluation of Wide Bay Water Smart Metering and Sustainable Water Pricing Initiative Project; University of Technology Sydney: Ultimo, Australia, 2010. [Google Scholar]
Drogkoula, M.; Kokkinos, K.; Samaras, N. A Comprehensive Survey of Machine Learning Methodologies with Emphasis in Water Resources Management. Appl. Sci. 2023, 13, 12147. [Google Scholar] [CrossRef]
Wunsch, A.; Liesch, T.; Broda, S. Feature-Based Groundwater Hydrograph Clustering Using Unsupervised Self-Organizing Map-Ensembles. Water Resour. Manag. 2022, 36, 39–54. [Google Scholar] [CrossRef]
Chen, Y.; Qin, B.; Liu, T.; Liu, Y.; Li, S. The Comparison of SOM and K-Means for Text Clustering. Comput. Inf. Sci. 2010, 3, 268–274. [Google Scholar] [CrossRef]
Kiang, M.Y.; Hu, M.Y.; Fisher, D.M. An Extended Self-Organizing Map Network for Market Segmentation—A Telecommunication Example. Decis. Support. Syst. 2006, 42, 36–47. [Google Scholar] [CrossRef]
Melo Riveros, N.A.; Cardenas Espitia, B.A.; Aparicio Pico, L.E. Comparison between K-Means and Self-Organizing Maps Algorithms Used for Diagnosis Spinal Column Patients. Inf. Med. Unlocked 2019, 16, 100206. [Google Scholar] [CrossRef]
He, J.; Tan, A.H.; Tan, C.L.; Sung, S.Y. On Quantitative Evaluation of Clustering Systems. In Clustering and Information Retrieval; Springer: Boston, MA, USA, 2004; pp. 105–133. [Google Scholar] [CrossRef]
Balakrishnan, P.V.; Cooper, M.C.; Jacob, V.S.; Lewis, P.A. A Study of the Classification Capabilities of Neural Networks Using Unsupervised Learning: A Comparison with K-Means Clustering. Psychometrika 1994, 59, 509–525. [Google Scholar] [CrossRef]
Kumar, U.A.; Dhamija, Y. Comparative Analysis of SOM Neural Network with K-Means Clustering Algorithm. In Proceedings of the 5th IEEE International Conference on Management of Innovation and Technology, Singapore, 2–5 June 2010; pp. 55–59. [Google Scholar] [CrossRef]
Mingoti, S.A.; Lima, J.O. Comparing SOM Neural Network with Fuzzy C-Means, K-Means and Traditional Hierarchical Clustering Algorithms. Eur. J. Oper. Res. 2006, 174, 1742–1759. [Google Scholar] [CrossRef]
Nolte, A.; Haaf, E.; Heudorfer, B.; Bender, S.; Hartmann, J. Disentangling Coastal Groundwater Level Dynamics in a Global Dataset. Hydrol. Earth Syst. Sci. 2024, 28, 1215–1249. [Google Scholar] [CrossRef]
Celebi, M.E.; Kingravi, H.A.; Vela, P.A. A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm. Expert. Syst. Appl. 2013, 40, 200–210. [Google Scholar] [CrossRef]
Heil, J.; Häring, V.; Marschner, B.; Stumpe, B. Advantages of Fuzzy K-Means over k-Means Clustering in the Classification of Diffuse Reflectance Soil Spectra: A Case Study with West African Soils. Geoderma 2019, 337, 11–21. [Google Scholar] [CrossRef]
Kao, Y.T.; Zahara, E.; Kao, I.W. A Hybridized Approach to Data Clustering. Expert. Syst. Appl. 2008, 34, 1754–1762. [Google Scholar] [CrossRef]
Laszlo, M.; Mukherjee, S. A Genetic Algorithm That Exchanges Neighboring Centers for K-Means Clustering. Pattern Recognit. Lett. 2007, 28, 2359–2366. [Google Scholar] [CrossRef]
Nayak, J.; Kanungo, D.P.; Naik, B.; Behera, H.S. Evolutionary Improved Swarm-Based Hybrid K-Means Algorithm for Cluster Analysis. Adv. Intell. Syst. Comput. 2016, 379, 343–352. [Google Scholar] [CrossRef]
Niknam, T.; Amiri, B. An Efficient Hybrid Approach Based on PSO, ACO and k-Means for Cluster Analysis. Appl. Soft Comput. 2010, 10, 183–197. [Google Scholar] [CrossRef]
Rogiers, B.; Mallants, D.; Batelaan, O.; Gedeon, M.; Huysmans, M.; Dassargues, A. Model-Based Classification of CPT Data and Automated Lithostratigraphic Mapping for High-Resolution Characterization of a Heterogeneous Sedimentary Aquifer. PLoS ONE 2017, 12, e0176656. [Google Scholar] [CrossRef] [PubMed]
Sung, C.S.; Jin, H.W. A Tabu-Search-Based Heuristic for Clustering. Pattern Recognit. 2000, 33, 849–858. [Google Scholar] [CrossRef]
Zhang, C.L.; Jing, Z.L.; Pan, H.; Jin, B.; Li, Z.X. Robust Visual Tracking Using Discriminative Stable Regions and K-Means Clustering. Neurocomputing 2013, 111, 131–143. [Google Scholar] [CrossRef]
Tewari, A.; Singh, P.K.; Gaur, S.; Mishra, S.; Kumar, R. Cluster-Based Delineation of Optimal Sites for Managed Aquifer Recharge: A Case Study of Lower Betwa River Basin, India. Environ. Earth Sci. 2024, 83, 20. [Google Scholar] [CrossRef]
Li, J.; Hassan, D.; Brewer, S.; Sitzenfrei, R. Is Clustering Time-Series Water Depth Useful? An Exploratory Study for Flooding Detection in Urban Drainage Systems. Water 2020, 12, 2433. [Google Scholar] [CrossRef]
Mel, R.; Sterl, A.; Lionello, P. High Resolution Climate Projection of Storm Surge at the Venetian Coast. Nat. Hazards Earth Syst. Sci. 2013, 13, 1135–1142. [Google Scholar] [CrossRef]
Shende, S.; Chau, K.W. Design of Water Distribution Systems Using an Intelligent Simple Benchmarking Algorithm with Respect to Cost Optimization and Computational Efficiency. Water Supply 2019, 19, 1892–1898. [Google Scholar] [CrossRef]
Wu, Y.; Liu, S. Burst Detection by Analyzing Shape Similarity of Time Series Subsequences in District Metering Areas. J. Water Resour. Plan. Manag. 2020, 146, 04019068. [Google Scholar] [CrossRef]
Naranjo-Fernández, N.; Guardiola-Albert, C.; Aguilera, H.; Serrano-Hidalgo, C.; Montero-González, E. Clustering Groundwater Level Time Series of the Exploited Almonte-Marismas Aquifer in Southwest Spain. Water 2020, 12, 1063. [Google Scholar] [CrossRef]
Teimoori, S.; Olya, M.H.; Miller, C.J. Groundwater Level Monitoring Network Design with Machine Learning Methods. J. Hydrol. 2023, 625, 130145. [Google Scholar] [CrossRef]
Rizwan, A.; Iqbal, N.; Khan, A.N.; Ahmad, R.; Kim, D.H. Toward Effective Pattern Recognition Based on Enhanced Weighted K-Mean Clustering Algorithm for Groundwater Resource Planning in Point Cloud. IEEE Access 2021, 9, 130154–130169. [Google Scholar] [CrossRef]
Salehnia, N.; Salehnia, N.; Ansari, H.; Kolsoumi, S.; Bannayan, M. Climate Data Clustering Effects on Arid and Semi-Arid Rainfed Wheat Yield: A Comparison of Artificial Intelligence and K-Means Approaches. Int. J. Biometeorol. 2019, 63, 861–872. [Google Scholar] [CrossRef]
Javadi, S.; Saatsaz, M.; Hashemy Shahdany, S.M.; Neshat, A.; Ghordoyee Milan, S.; Akbari, S. A New Hybrid Framework of Site Selection for Groundwater Recharge. Geosci. Front. 2021, 12, 101144. [Google Scholar] [CrossRef]
Mohammadrezapour, O.; Kisi, O.; Pourahmad, F. Fuzzy C-Means and K-Means Clustering with Genetic Algorithm for Identification of Homogeneous Regions of Groundwater Quality. Neural Comput. Appl. 2020, 32, 3763–3775. [Google Scholar] [CrossRef]
Kitzig, M.C.; Kepic, A.; Kieu, D.T. Testing Cluster Analysis on Combined Petrophysical and Geochemical Data for Rock Mass Classification. Explor. Geophys. 2017, 48, 344–352. [Google Scholar] [CrossRef]
Lharti, H.; Sirieix, C.; Riss, J.; Verdet, C.; Salmon, F.; Lacanette, D. Partitioning a Rock Mass Based on Electrical Resistivity Data: The Choice of Clustering Method. Geophys. J. Int. 2023, 234, 439–452. [Google Scholar] [CrossRef]
Wang, Y.; Ksienzyk, A.K.; Liu, M.; Brönner, M. Multigeophysical Data Integration Using Cluster Analysis: Assisting Geological Mapping in Trøndelag, Mid-Norway. Geophys. J. Int. 2021, 225, 1142–1157. [Google Scholar] [CrossRef]
Kayhomayoon, Z.; Ghordoyee Milan, S.; Arya Azar, N.; Kardan Moghaddam, H. A New Approach for Regional Groundwater Level Simulation: Clustering, Simulation, and Optimization. Nat. Resour. Res. 2021, 30, 4165–4185. [Google Scholar] [CrossRef]
Yin, X.; Chen, Y.; Bouferguene, A.; Al-Hussein, M. Data-Driven Bi-Level Sewer Pipe Deterioration Model: Design and Analysis. Autom. Constr. 2020, 116, 103181. [Google Scholar] [CrossRef]
Atique, F. Analysis of Urban Pipe Deterioration Using Copula Method; University of Delaware: Newark, DE, USA, 2016. [Google Scholar]
Opila, M.C. Structural Condition Scoring of Buried Sewer Pipes for Risk-based Decision Making; University of Delaware: Newark, DE, USA, 2011. [Google Scholar]
Lindner, R. Effectively Managing Sewer Pipeline Infrastructure. J. Undergr. Infrastr. Manag. 2008, 23–25. [Google Scholar]
Atambo, D.O.; Najafi, M.; Kaushal, V. Development and Comparison of Prediction Models for Sanitary Sewer Pipes Condition Assessment Using Multinomial Logistic Regression and Artificial Neural Network. Sustainability 2022, 14, 5549. [Google Scholar] [CrossRef]
Afonso, M.J.; Freitas, L.; Chaminé, H.I. Groundwater Recharge in Urban Areas (Porto, NW Portugal): The Role of GIS Hydrogeology Mapping. Sustain. Water Resour. Manag. 2019, 5, 203–216. [Google Scholar] [CrossRef]
Zhang, K.; Chui, T.F.M. A Review on Implementing Infiltration-Based Green Infrastructure in Shallow Groundwater Environments: Challenges, Approaches, and Progress. J. Hydrol. 2019, 579, 124089. [Google Scholar] [CrossRef]
Malek Mohammadi, M.; Najafi, M.; Kermanshachi, S.; Kaushal, V.; Serajiantehrani, R. Factors Influencing the Condition of Sewer Pipes: State-of-the-Art Review. J. Pipeline Syst. Eng. Pract. 2020, 11, 03120002. [Google Scholar] [CrossRef]
Akinci, H.; Özalp, A.Y.; Turgut, B. Agricultural Land Use Suitability Analysis Using GIS and AHP Technique. Comput. Electron. Agric. 2013, 97, 71–82. [Google Scholar] [CrossRef]
Yap, J.Y.L.; Ho, C.C.; Ting, C.Y. A Systematic Review of the Applications of Multi-Criteria Decision-Making Methods in Site Selection Problems. Built Environ. Proj. Asset Manag. 2019, 9, 548–563. [Google Scholar] [CrossRef]
Seyedmohammadi, J.; Sarmadian, F.; Jafarzadeh, A.A.; McDowell, R.W. Development of a Model Using Matter Element, AHP and GIS Techniques to Assess the Suitability of Land for Agriculture. Geoderma 2019, 352, 80–95. [Google Scholar] [CrossRef]
Salman, B.; Salem, O. Modeling Failure of Wastewater Collection Lines Using Various Section-Level Regression Models. J. Infrastruct. Syst. 2012, 18, 146–154. [Google Scholar] [CrossRef]
Theochari, A.P.; Feloni, E.; Bournas, A.; Baltas, E. Hydrometeorological—Hydrometric Station Network Design Using Multicriteria Decision Analysis and GIS Techniques. Environ. Process. 2021, 8, 1099–1119. [Google Scholar] [CrossRef]
Van Metre, P.C.; Qi, S.; Deacon, J.; Dieter, C.; Driscoll, J.M.; Fienen, M.; Kenney, T.; Lambert, P.; Lesmes, D.; Mason, C.A.; et al. Prioritizing River Basins for Intensive Monitoring and Assessment by the US Geological Survey. Environ. Monit. Assess. 2020, 192, 458. [Google Scholar] [CrossRef]
Mashford, J.; Marlow, D.; Tran, D.; May, R. Prediction of Sewer Condition Grade Using Support Vector Machines. J. Comput. Civ. Eng. 2011, 25, 283–290. [Google Scholar] [CrossRef]
Hwangbo, S.; Al, R.; Chen, X.; Sin, G. Integrated Model for Understanding N₂O Emissions from Wastewater Treatment Plants: A Deep Learning Approach. Environ. Sci. Technol. 2021, 55, 2143–2151. [Google Scholar] [CrossRef]
Chen, K.; Chen, H.; Zhou, C.; Huang, Y.; Qi, X.; Shen, R.; Liu, F.; Zuo, M.; Zou, X.; Wang, J.; et al. Comparative Analysis of Surface Water Quality Prediction Performance and Identification of Key Water Parameters Using Different Machine Learning Models Based on Big Data. Water Res. 2020, 171, 115454. [Google Scholar] [CrossRef] [PubMed]
Fontecha, J.E.; Agarwal, P.; Torres, M.N.; Mukherjee, S.; Walteros, J.L.; Rodríguez, J.P. A Two-Stage Data-Driven Spatiotemporal Analysis to Predict Failure Risk of Urban Sewer Systems Leveraging Machine Learning Algorithms. Risk Anal. 2021, 41, 2356–2391. [Google Scholar] [CrossRef]
Friedel, M.J. Estimation and Scaling of Hydrostratigraphic Units: Application of Unsupervised Machine Learning and Multivariate Statistical Techniques to Hydrogeophysical Data. Hydrogeol. J. 2016, 24, 2103–2122. [Google Scholar] [CrossRef]
Braun, B.Á.; Abordán, A.; Szabó, N.P. Lithology determination in a coal exploration drillhole using Steiner weighted cluster analysis. Geosci. Eng. 2016, 5, 51–64. [Google Scholar]
Kriegel, H.P.; Kröger, P.; Sander, J.; Zimek, A. Density-Based Clustering. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2011, 1, 231–240. [Google Scholar] [CrossRef]
Locatelli, L.; Mark, O.; Mikkelsen, P.S.; Arnbjerg-Nielsen, K.; Deletic, A.; Roldin, M.; Binning, P.J. Hydrologic Impact of Urbanization with Extensive Stormwater Infiltration. J. Hydrol. 2017, 544, 524–537. [Google Scholar] [CrossRef]
Prapanchan, V.N.; Subramani, T.; Karunanidhi, D. GIS and Fuzzy Analytical Hierarchy Process to Delineate Groundwater Potential Zones in Southern Parts of India. Groundw. Sustain. Dev. 2024, 25, 101110. [Google Scholar] [CrossRef]
Sitorus, F.; Brito-Parada, P.R. Equipment Selection in Mineral Processing—A Sensitivity Analysis Approach for a Fuzzy Multiple Criteria Decision Making Model. Min. Eng. 2020, 150, 106261. [Google Scholar] [CrossRef]
Sharifi, A.; Tarlani Beris, A.; Sharifzadeh Javidi, A.; Nouri, M.; Gholizadeh Lonbar, A.; Ahmadi, M. Application of Artificial Intelligence in Digital Twin Models for Stormwater Infrastructure Systems in Smart Cities. Adv. Eng. Inform. 2024, 61, 102485. [Google Scholar] [CrossRef]
Sakti, A.D.; Mahdani, J.N.; Santoso, C.; Ihsan, K.T.N.; Nastiti, A.; Shabrina, Z.; Safira, M.; Rohmat, F.; Yulianto, F.; Virtriana, R. Optimizing City-Level Centralized Wastewater Management System Using Machine Learning and Spatial Network Analysis. Environ. Technol. Innov. 2023, 32, 103360. [Google Scholar] [CrossRef]

Figure 1. Overview of the applied approach.

Figure 2. Study area location (a); rocks (b); alluvial deposits (c); and faults, rivers, observation wells, sewer networks, and their catchments (d).

Figure 3. Topographic elevation (a), soil (b), land cover/land use (c), flood potential (d), mass movement (e), and made ground (f) in the study area.

Figure 4. Precipitation in winter (March 2023) (a), spring (April 2023) (b), summer (September 2023) (c), and autumn (December 2023) (d), along with groundwater depth in winter (March 2023) (e), spring (April 2023) (f), summer (September 2023) (g), and autumn (December 2023) (h).

Figure 5. Criterial maps generated through classification, reclassification, and fuzzification, including rock (a), alluvium (b), soil (c), land cover/land use (d), made ground (e), mass movement (f), fault proximity (g), fault length (h), river proximity (i), flood potential (j), elevation (k), slope (l), drainage order (m), and topographic wetness index (n).

Figure 6. Criterial maps for precipitation in winter (March 2023) (a), spring (April 2023) (b), summer (September 2023) (c), and autumn (December 2023) (d), along with groundwater depth in winter (March 2023) (e), spring (April 2023) (f), summer (September 2023) (g), and autumn (December 2023) (h) after classification, reclassification, and fuzzification.

Figure 7. Groundwater infiltration probability scores derived from the AHP-weighted GIS approach in winter (a), spring (b), summer (c), and autumn (d), including locations of storm overflow discharges.

Figure 8. Groundwater infiltration probability scores derived from the GIS approach in winter based on equal weights assigned to all layers (a), five highest-weighted layers (groundwater depth, river proximity, flood potential, rock, and alluvium) (b), and two highest-weighted layers (groundwater depth and river proximity) (c). Additionally, the differences between the scores derived from the final output of the AHP-weighted GIS approach in winter (shown in Figure 6a) and those based on equal weights for all layers (d), five highest-weighted layers (e), and two highest-weighted layers (f) are shown.

Figure 9. Groundwater infiltration probability levels in winter derived from the K-means clustering algorithm considering all layers with AHP weights (a); all layers with equal weights (b); and five highest-weighted layers (groundwater depth, river proximity, flood potential, rock, and alluvium) with equal weights (c).

Figure 10. Groundwater infiltration probability levels for the AHP-weighted GIS-based scores in winter (shown in Figure 6a), including the application of the K-means clustering algorithm (a), division of the probability scores using equal-interval classification (b), and division of the probability scores using quantile classification (c).

Table 1. Classification, reclassification, and weights for different schematic layers.

Schematic Layer	Classification and Reclassification	Weight
Topographic elevation	<5 m = 10; 5–10 m = 9; 10–50 m = 7; 50–100 m = 5; 100–200 m = 3; >200 m = 1	3.05
Slope	0–15° = 10; 15–30° = 7; 30–45° = 3; >45° = 1	4.40
Topographic wetness index	>20 = 10; 15–20 = 7; 10–15 = 5; 5–10 = 3; <5 = 1	6.40
Drainage order	10–11 = 10; 8–9 = 9; 6–7 = 7; 3–5 = 3; 1–2 = 1	3.20
Groundwater depth	<5 m = 10; 5–10 m = 7; 10–15 m = 3; ≥15 m = 1	13.45
Precipitation	>200 mm/month = 10; 150–200 = 7; 100–150 = 5; <100 = 3	3.00
Fault proximity	<50 m = 10; 50–100 m = 7; 100–500 m = 5; >500 m = 0	6.80
Fault length	>3000 m = 10; 2000–3000 m = 9; 1000–2000 m = 7; 500–1000 m = 5; 100–500 m = 4; <100 m = 3	6.80
Rock	Gravel = 10; Sandstone and gravel = 9; Sandstone = 8; Sandstone and subordinate breccia/sandstone and basalt = 7; Breccia and sandstone/sandstone along with sandstone and mudstone/gravel, clayey = 6; Sandstone/breccia/mudstone/limestone = 5; Mudstone–sandstone interbedded = 4; Tuff = 3; Slate = 2; Chert/mudstone = 1	8.75
Alluvium	Gravel = 10; Sand and gravel = 9; Sand = 8; Sand with clay and gravel = 6; Clay/sand variants = 4; Clay/silt/sand and silt = 3	8.30
Made ground	Artificial (infilled) deposits = 10; Non-artificial = 0	5.05
Mass movement	Landslide deposits = 10; Non-landslide = 0	3.45
River proximity	<25 m = 10; 25–50 m = 7; 50–100 m = 3; >100 m = 0	10.80
Flooding potential	High = 10; Low = 7; None = 0	8.00
Land cover/land use (LC/LU)	Freshwater = 10; Saltmarsh/woodland = 7; Heather/littoral rock = 6; Improved grassland = 5; Urban/suburban = 4; Arable/horticulture = 3; Inland rock = 2	4.00
(Weathered) soil type	Sand = 10; Sand to sandy loam = 9; Sand to loam = 8; Sandy loam = 7; Clayey loam to sandy loam/loam to sandy loam = 6; Clay to sandy loam/silt to silty loam/loam to silty loam/varied (locally peaty) = 5; Clayey loam to silty loam = 4; Clayey loam = 3; Clay to clayey loam = 1	4.55

Table 2. Agreement levels based on Kappa statistic value.

Statistic Value	Level of Agreement
≤0	No agreement
0.01–0.20	None to slight
0.21–0.40	Fair
0.41–0.60	Moderate
0.61–0.80	Substantial
0.81–1.00	Almost perfect

Table 3. Statistics of groundwater infiltration probability scores derived from the GIS approach, considering different seasons and numbers of layers with either AHP or equal weights.

Season	Spring	Summer	Autumn	Winter
Weighting	AHP	AHP	AHP	AHP	Equal	AHP minus equal weights	Equal	AHP minus equal weights	Equal	AHP minus equal weights
Layers	All layers ¹	All layers	All layers	All layers	All layers	All layers	Five layers ²	All layers (AHP-weighted) minus five layers (equally weighted)	Two layers ³	All layers (AHP-weighted) minus two layers (equally weighted)
Minimum	0.07	0.07	0.08	0.08	0.10	−0.11	0.00	−0.34	0.00	−0.64
Maximum	0.78	0.78	0.78	0.78	0.72	0.11	0.93	0.30	1.00	0.47
Range	0.71	0.70	0.70	0.70	0.62	0.22	0.93	0.64	1.00	1.10
Mean	0.28	0.28	0.29	0.29	0.31	−0.02	0.27	0.02	0.28	0.02
Standard deviation	0.09	0.09	0.09	0.09	0.07	0.03	0.15	0.08	0.24	0.17

¹ GWD, river, flood potential, rock, alluvium, elevation, slope, drainage order, TWI, LC/LU, soil, made ground, mass movement, fault proximity, fault length, precipitation. ² GWD, river, flood potential, rock, and alluvium. ³ GWD and river.

Table 4. Statistics for storm overflow locations in relation to groundwater infiltration scores from the AHP-weighted GIS approach in winter.

Statistic	Value
Minimum	0.17
Maximum	0.64
Mean	0.40
Median	0.44
Standard deviation	0.12
First quartile (Q1)	0.31
Third quartile (Q3)	0.48

Table 5. Kappa statistics for evaluating the spatial agreement between the GIS and K-means approaches in winter.

Statistics		K-Means Clustering and AHP-Weighted GIS Approach with Equal-Interval Classification			K-Means Clustering and AHP-Weighted GIS Approach with Quantile Classification			K-Means Clustering and AHP-Weighted GIS Approach with K-Means Clustering for Classification
Error matrix	cluster	1	2	3	1	2	3	1	2	3
	1	57,752,516	30,868,308	1794	44,986,749	1,265,170	0	50,222,276	5,026,305	0
	2	5,815,602	28,548,327	13,435,235	14,084,274	31,987,004	4478	13,008,762	47,919,767	795,014
	3	0	17011	2,122,980	4,497,095	26,181,472	15,555,531	393,951	6,514,881	14,803,662
Kappa		0.35			0.50			0.70
Percentage of the same classifications		63.82			66.78			81.44

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zeydalinejad, N.; Javadi, A.A.; Jacob, M.; Baldock, D.; Webber, J.L. Can Proxy-Based Geospatial and Machine Learning Approaches Map Sewer Network Exposure to Groundwater Infiltration? Smart Cities 2025, 8, 145. https://doi.org/10.3390/smartcities8050145

AMA Style

Zeydalinejad N, Javadi AA, Jacob M, Baldock D, Webber JL. Can Proxy-Based Geospatial and Machine Learning Approaches Map Sewer Network Exposure to Groundwater Infiltration? Smart Cities. 2025; 8(5):145. https://doi.org/10.3390/smartcities8050145

Chicago/Turabian Style

Zeydalinejad, Nejat, Akbar A. Javadi, Mark Jacob, David Baldock, and James L. Webber. 2025. "Can Proxy-Based Geospatial and Machine Learning Approaches Map Sewer Network Exposure to Groundwater Infiltration?" Smart Cities 8, no. 5: 145. https://doi.org/10.3390/smartcities8050145

APA Style

Zeydalinejad, N., Javadi, A. A., Jacob, M., Baldock, D., & Webber, J. L. (2025). Can Proxy-Based Geospatial and Machine Learning Approaches Map Sewer Network Exposure to Groundwater Infiltration? Smart Cities, 8(5), 145. https://doi.org/10.3390/smartcities8050145

Article Menu

Can Proxy-Based Geospatial and Machine Learning Approaches Map Sewer Network Exposure to Groundwater Infiltration?

Abstract

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Methodology

2.1.1. Data and Thematic Layers

2.1.2. Classification

2.1.3. Reclassification

2.1.4. Weights of Layers

2.1.5. Combination of Layers

2.1.6. K-Means Clustering

2.1.7. Verification and Comparison of Models

2.1.8. Sensitivity Analysis and Key Influencing Factors

2.2. Location Description

3. Results

3.1. Thematic Layers

3.2. Geospatial Technology

3.3. Machine Learning to Classify Risk Regions

3.4. Evaluating Agreement Between Machine Learning and Geospatial Approaches Using Cohen’s Kappa

4. Discussion

4.1. Integrating F-AHP-Based Geospatial Approach with ML to Efficiently Identify High-Risk Areas

4.2. Model Validation and Sensitivity Analysis for Robust Variable Selection in F-AHP GIS and ML Approaches

4.3. K-Means Clustering and Alternative Methods as Pathways for Future Groundwater and Sewer Network Research

4.4. Limitations of the F-AHP-Based Geospatial and ML Approach

4.4.1. How to Move Research from GWD to GWI Probability

4.4.2. Expanding Thematic Layers to Better Capture GWI

4.4.3. Limitations and Consistency Considerations in AHP-GIS Approach

4.4.4. Critical Challenges in Using ML and K-Means Clustering for GWI Risk Assessment

General Considerations

Sewer-Related Considerations

4.4.5. Assessment of Model Agreement and the Influence of Clustering Methods

4.4.6. Hydrological Impacts of Urbanisation on Groundwater–Sewer Interactions

5. Conclusions

6. Recommendations

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI